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Preface 



The mathematical area of analysis is often described as the study of limits, 
continuity, and convergence, as in calculus. It is also very much concerned with 
estimates, whether or not a limit is involved. Here we look at several topics 
involving norms on vector spaces and linear mappings, with prerequisites along 
the lines of advanced calculus and basic linear algebra. The main idea is to 
explore intermediate ranges of abstraction and sophistication, without getting 
bogged down with too many technicalities. Lebesgue integrals are not required, 
but could easily be incorporated by readers familiar with that theory. 

We begin with some inequalities related to convexity in the first chapter, 
which can be applied to sums or integrals. The next three chapters focus on 
finite-dimensional vector spaces and linear transformations between them. Some 
properties of infinite sums are described in Chapter 5, as well as a class of 
infinite-dimensional spaces known as l p spaces. The latter give examples of 
Banach and Hilbert spaces, which are considered more abstractly in Chapter 
6. An important tool for dealing with bounded linear operators on £ p or LP 
spaces is Marcel Riesz' convexity theorem, presented in Chapter 8.9. As a 
further introduction to real-variable methods in harmonic analysis, estimates 
for dyadic maximal and square functions are discussed in Chapter 8. A brief 
review of some basic notions about metric spaces is included in Appendix A. 

Of course, there are numerous excellent texts on these and related subjects, 
a selection of which can be found in the bibliography. Indeed, it is hoped that 
readers might pursue specific topics more fully, according to their interests. 
Here one might find a few tricks of the trade, or simplified special cases, which 
illustrate broader concepts. I would like to dedicate this book to my fellow 
students from Washington University in St Louis, and to the faculty there, from 
whom we learned a great deal. 
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Chapter 1 

Preliminaries 



1.1 Real and complex numbers 

As usual, the real line is denoted R, the complex plane is denoted C, and the 
set of integers is denoted Z. If A is a subset of R and b is a real number such 
that a < b for all a 6 A, then b is said to be an upper bound for A. A real 
number c is said to be the least upper bound or supremum of A if c is an upper 
bound for A and c < b for every real number b which is an upper bound for A. 
One version of the completeness of the real numbers asserts that a nonempty 
subset A of R with an upper bound has a least upper bound. It is easy to see 
from the definition that the supremum sup A of A is unique when it exists. 

Similarly, if A C R and y e R satisfy y < x for every x & A, then y is said 
to be a lower bound of A. If z is a real number such that z is a lower bound for 
A and y < z for every real number y which is a lower bound for A, then z is said 
to be a greatest lower bound or infimum of A. It follows from the completeness 
of the real numbers that every nonempty subset A of R with a lower bound has 
a greatest lower bound. This can be obtained as the supremum of the set of 
lower bounds for A, or as the negative of the supremum of 

(1.1) - A= {-a: a e A}. 

Again, it is easy to see directly from the definition that the infimum inf A of A 
is unique when it exists. 

It is sometimes convenient to use extended real numbers, which are real num- 
bers together with +00, —00, with standard conventions concerning arithmetic 
operations and ordering. More precisely, 

(1.2) - 00 < x < +00 
and 

(1.3) x + (+00) = (+00) + x = +00, x + (—00) = (—00) + x = —00 
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for every x € R. If a; is a positive real number, then 

(1.4) x ■ (+oo ) = (+oo ) • x = +oo, x ■ (— oo ) = (— oo ) • a; = — oo 

while the product of ±oo with a negative real number changes the sign. The 
product of ±oo with ±oo is defined to be ±oo, where the signs are multiplied 
in the usual way. One can also define xj ± oo to be for every x € R, but 
expressions such as oo — oo, oo/oo, and 0/0 arc not defined. If one allows 
extended real numbers, then every nonempty set 4CR has a supremum and 
an infimum, where sup A = +oo if A does not have a finite upper bound, and 
inf A = — oo if A docs not have a finite lower bound. In situations where all 
of the quantities of interest are nonnegative, it may be appropriate to interpret 
1/0 as being equal to +oo. 

If a and b are real numbers with a < b, then there are four types of intervals 
in the real line with endpoints a and b, i.e., the open interval (a, b), the half- 
open, half-closed intervals (a, b], [a, 6), and the closed interval [a, b]. These four 
types of intervals are defined as follows: 

(a, b) = {x G R : a < x < b} 

(a, b] = {x G R : a < x < b} 

[a,b) = {x G R : a < x < b} 

[a,b] = {x G R : a < x < b}. 

The length of each of these intervals is defined to be b — a, and the length of an 
interval I may be denoted \I\. 

We also consider [a, b] to be defined when a = b, in which event the interval 
consists of a single point and has length equal to 0. For an interval which is 
open at the left endpoint a, we may allow a — — oo, and for an interval which is 
open at the right endpoint b, we may allow b = +oo. Hence the real line may be 
expressed as (— oo, +oo). In these cases, we say that the interval is unbounded, 
while an interval with finite endpoints is said to be bounded. 

If a; is a real number, then the absolute value of x is denoted \x\ and defined 
to be x when x > and to be —x when x < 0. Thus \x\ is always a nonnegative 
real number, |a;| = if and only if x = 0, and 

(1.5) \x + y\<\x\ + \y\ 
and 

(1.6) \x ■ y\ = \x\ ■ \y\ 

for every x, y G R. These properties are not difficult to verify. 

Suppose that z = x + iy is a complex number, where x, y € R. One may 
refer to x, y as the real and imaginary parts of z, denoted Rez, Imz. The 
complex conjugate of z is denoted ~z and defined by 

(1.7) ~z = x — iy. 
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It is easy to see that 

(1.8) z + w = z + w 
and 

(1.9) z ■ w = z ■ To 
for every z,w G C. Note that 

(1.10) z + z = 2Rez and z — z — 2ilmz 

for every zeC, and that the complex conjugate of ~z is equal to z. 

The modulus of z = x + iy G C, x, y £ R, is denoted \z\ and defined to be 
the nonnegative real number given by 



(1.11) \z\ = ^x 2 +y 2 . 

Thus the modulus of z is the same as the absolute value of z when z G R, and 

(1.12) |Rez[,|Imz| < \z\ 

for every z G C. Of course, the modulus of z is the same as the modulus of the 
complex conjugate of z, and it is easy to see that 

(1.13) \z\ 2 = z-z 
for every z G C. This implies that 

(1.14) \z ■ w\ = \z\ ■ \w\ 

for every z, w G C, because of (1.9). 

Similarly, we would like to check that 

(1.15) |z + H<l*l + M 

for every z, w G C. Using (1.13) applied to z + w and then (1.8), we get that 

(1.16) \z + w\ 2 = (z + w) (z + w) = \z\ 2 + zw + wz + \w\ 2 . 
We also have that 



(1.17) zw + w z = zw + (zw) = 2Rezw < 2\zw\ = 2\z\\w\, 
and hence that 

(1.18) \z + w\ 2 < \z\ 2 + 2\z\ \w\ + \w\ 2 = (\z\ + \w\) 2 . 

This implies (1.15), as desired. 

A sequence {zn}^ =1 of complex numbers is said to converge to another 
complex number z if for every e > there is a positive integer N such that 

(1.19) \z n -z\<e 
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for every n > N. One can check that the limit z of the sequence {zn}^i is 
unique when it exists, in which case we put 

(1.20) lim z n = z. 

n— >oo 

If {w n }%Li, {z n }%Li are two sequences of complex numbers which converge 
to the complex numbers w, z, respectively, then the sequences {w n + z„\^ =11 
{w n ■ ZnlJ^Li °f sums and products converge to the sum w + z and product w • z 
of the limits, respectively. A sequence of complex numbers converges 

to a complex number z if and only if the sequences of real and imaginary parts 
of the z n 's converge to the real and imaginary parts of z. 

Let {i n }™ =1 be a sequence of real numbers which is monotone increasing, 
which is to say that x n < x n+ \ for each n. One can check that {xnj^Li converges 
if and only if the set of x„'s has an upper bound, in which case the limit of the 
sequence is equal to the supremum of this set. For any sequence {i n }™ =1 of 
real numbers, 

(1.21) Xj — > +oo as j — > oo 

if for each L > there is a positive integer N such that 



(1.22) x n > L 

for every n> N. If {x n }%Li is an unbounded monotone increasing sequence of 
real numbers, then x n +oo as n — > oo. Similar remarks apply to monotone 
decreasing sequences of real numbers. 

Let {an}^! be a sequence of real numbers. For each positive integer k, put 

(1.23) Ak = sup{a„ : n > k}, 

which may be +oo. Thus Ak+i < Ak for every k. The upper limit of {a n }" =1 is 
denoted limsup,^,^ a„ and defined to be the infimum of the A^s, which may 
be ±oo. Similarly, if 

(1.24) Bi = inf{a„ : n > I}, 

then B[ < for every I, and the lower limit liminfn^oo a n of {a n }^ =1 is 

defined to be the supremum of the -Bz's. By construction, Bi < Ak for every fc 
and I, and hence 

(1.25) liminfa„ < lim sup a„. 
One can check that a n — > a as n — >• oo if and only if 

(1.26) liminf a n = lim sup a n = a. 

rwoc n^oo 

A sequence of complex numbers is said to be a Cauchy sequence if 

for every e > there is a positive integer N such that 



(1.27) 



\zi - z n \ < e 
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for each l,n > N. It is easy to see that {zn}^Li * s a Cauchy sequence if and only 
if the corresponding sequences of real and imaginary parts of the Z n S tlXC Cauchy 
sequences, and that convergent sequences are automatically Cauchy sequences. 
It is not difficult to show that the upper and lower limits of a Cauchy sequence 
of real numbers are finite and equal, and hence that every Cauchy sequence 
of real numbers converges. It follows that every Cauchy sequence of complex 
numbers converges too. 

An infinite series of complex numbers Y^jLo a i * s sa ^ ^° conver 9 e if the 
corresponding sequence of partial sums Y^j=o a i converges, in which case the 
sum of the series is defined to be the limit of the sequence of partial sums. If 
Y^jLo a j converges, then 

(1.28) lim a 3 = 0. 

The partial sums of an infinite series whose terms are nonnegative real numbers 
are monotone increasing, and therefore the series converges if and only if the 
partial sums are bounded. An infinite series X^jlo a i °f complex numbers is 
said to converge absolutely if 

oo 

(1.29) 5>,-| 

3=0 

converges. One can check that the partial sums of an absolutely convergent 
series form a Cauchy sequence, and therefore converge. 

If A is a subset of a set X, then 1a(x) denotes the indicator function of A 
on X. This is the function equal to 1 when x G A and to when x G X\A, and 
it is sometimes called the characteristic function associated to A. A function 
on the real line, or on an interval in the real line, is called a step function if it 
is a finite linear combination of indicator functions of intervals. Equivalently, 
this means that there is a finite partition of the domain into intervals on which 
the function is constant. In this book, one is normally welcome to restrict one's 
attention to functions on the real line that are step functions, at least in the 
context of integrating functions on R. Step functions are convenient because 
their integrals can be reduced immediately to finite sums. Results about other 
functions can often be derived from those for step functions by approximation. 

1.2 Convex functions 

Let / be an open interval in the real line, which may be unbounded. A real- 
valued function <f)(x) on I is said to be convex if 

(1.30) cj>{\ x + (1 - A) y) < A <f>{x) + (1 - A) cj>{y) 

for every x,y £ I and A G [0, 1]. 

If (j)(x) is an affine function, which is to say that <f)(x) = ax + biov some real 
numbers a and b, then <j)(x) is a convex function on the whole real line, with 
equality in (1.30) for all x, y, and A. Equivalently, both <f)(x) and — cf)(x) are 
convex, which characterizes affine functions. It is easy to see that (j)(x) = \x\ 
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is a convex function on the whole real line too. If <fi(x) is an arbitrary convex 
function on I, and if c is a real number, then the translation <p(x — c) of <p(x) is 
a convex function on 

(1.31) I + c= {x + c : x e I}. 

In particular, for each real number c, \x — c| defines a convex function on R. 
Lemma 1.32 A real-valued function <f)(x) on I is convex if and only if 

(1 33) W) ~ <j>(s) < Hu) - <j>(s) < <j>(u) - 0ft) 

t— s ~ u — s ~ u—t 

for every s,t,u G I with s <t < u. 
If s, t, u are as in the lemma, then 

/„ „ .x t — s u — t 

(1.34) t= uH s, 

u — s u — s 

where 

t - s 

(1.35) < < 1 

u — s 

and 

u — t t — s 
1.36 = 1 . 

u — s u — s 

If 4>(x) is convex, then 

(1.37) 0( t )<lz£.0( u ) + JiZ%( a ). 

tt — s u — s 

One can rewrite this in two different ways to get (1.33). Conversely, one can 
work backwards, and rewrite either of the inequalities in (1.33) to get (1.37), 
which gives (1.30) when s, t, and u correspond to x, y, and A as in (1.34). 

Lemma 1.38 A function <p(x) on I is convex if and only if for each t £ I 
there is a real-valued affine function A{x) on R such that A{t) = <f>(t) and 
A{x) < 4>(x) for every x £ I. 

To see that this condition is sufficient for to be convex, let x, y, and A be 
given in the usual way. If A is an affine function associated to 

(1.39) t = Xx+(l-X)y 
as in the statement of the lemma, then 

(1.40) cf)(\x + (l- X)y) = A{Xx+{l-X)y) 

= XA(x) + (l-X)A(y) 
< A#a0 + (1-A)#j/). 
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Conversely, suppose that <j>(x) is convex, and let t £ I be given. We would 
like to choose a real number a so that 

(1.41) A(x) = </>(t) + a(x-t) 
satisfies A(x) < <p{x) for all x £ /, which is equivalent to 

(1.42) a{x-t)<<i>{x)-<t>(t) 

for x £ I. This is trivial when x = t, and otherwise we can rewrite (1.42) as 

(1.43) a < ««>:««> 
when x > t, and as 

(1.44) < a 
when a; < t. It follows from Lemma 1.32 that 

(1 45) - Ms) < - </>(*) 

for every s,u £ I such that s < t < u. Hence 
(1.46) A = B up| ^zM ;8e j, a <t 



(1.47) D r =M r iu) ' Ht) :uel,u>t} 



and 

7^ D- = inf/^ 

li t 

are well-defined and satisfy 

(1.48) Di < D r . 

To get (1.43) and (1-44), it suffices to choose a £ R such that 

(1.49) Di<a< D r . 

This completes the proof of Lemma 1.38. 

A real-valued function <f>(x) on I is said to be strictly convex if 

(1.50) <f>(\ x + (1 - A) y) < X <f>{x) + (1 - A) cj>{y) 
for every x,y £ I such that x ^ y and each A £ (0, 1). 

Lemma 1.51 A real-valued function cf) on I is strictly convex if and only if for 
every point t £ I there is a real-valued affine function A{x) on R such that 
A(t) = (f>{t) and A(x) < <j>(x) for all x £ I\{t}. 
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This is the analogue of Lemma 1.38 for strictly convex functions, which can 
be obtained in practically the same manner as before. For the existence of A 
when <fi is strictly convex, one can start with A as in the previous lemma, and 
use strict convexity to show that A(x) ^ 4>{x) when i/t. 

The convexity of a real-valued function (f> on I can also be characterized by 
the property that for each x,y e / with x < y, 



where B is the afhne function on the real line which is equal to (f> at x and y. 
Strict convexity corresponds to 



This is easy to check, just using the definitions. 

Note that convex functions are automatically continuous. This follows by 
trapping a convex function on both sides of a point between affine functions 
with the same value at that point. 

Lemma 1.54 If (j) is a continuous real-valued function on I, and if for each x, 
y in I there is a X x y 6 (0, 1) such that (1.30) holds with A = \ x _ y , then (f> is 
convex. 

This is often stated in the special case where \ x ^ y = 1/2 for every x,y e /, in 
which event one can iterate the condition and pass to a limit to get the desired 
inequality for arbitrary A. Alternatively, for each x,y E I with x < y, let L(x, y) 
be the set of A € [0, 1] such that (1.30) holds, which is a closed set when (f> is 
continuous, and which automatically contains and 1. If L(x,y) ^ [0, 1], then 
one can get a contradiction under the conditions of the lemma, by considering 
a maximal open interval in [0, l]\L(x, y), and showing that it has to contain an 
element of L(x, y). 

1.3 Some related inequalities 

Suppose that is a convex function on an open interval / C R, as in the previous 
section. If K is an interval in R of positive length and / is an integrable function 
on K such that f(x) e / for all x e K, then 



(1.52) 



4>{t) < B(t) for every t e [x, y], 



(1.53) 



4>(t) < B(t) when t G (x, y). 



(1.55) 




and 



(1.56) 




This is called Jensen's inequality. 
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Let us first consider the analogous statement for finite sums. If x\ , Xi , . . . , x n 
are elements of / and Ai,A2,...,A„ are nonnegative real numbers such that 
E"=i A, = 1, then 

n 

(1.57) Y,XiXieI, 

1=1 

and 

n n 

(1.58) <f>(22Xixi) <^2K4>(xi). 

»=1 i=l 

This is the same as (1.30) when n = 2, and one can apply (1.30) repeatedly 
to get the general case. One can also use the characterization of convexity in 
Lemma 1.38, as in (1.40). If / is a step function, then (1.56) follows directly from 
(1.58). In general, one can reduce to the case of finite sums through suitable 
approximations, or employ Lemma 1.38 in the same way as for sums. 

It is well known that <f>(t) = \t\ p is a convex function on the real line when p 
is a real number such that p>l, and moreover that \t\ p is strictly convex when 
p > 1. In particular, 

(1.59) 



K\- x f f(x)dx < IK]- 1 f \f(x)\ p dx 

JK JK 



for real-valued functions / on an interval K of positive length. 
Let p, q be real numbers such that p, q > 1 and 

(1.60) - + - = 1. 

p q 

In this event we say that p and q are conjugate exponents. If /, g are nonnegative 
real- valued functions on an interval K, then Holder's inequality states that 

(1.61) Jj(x)g(x)dx< (Jj{yydy) 1/P (J K g{zydz) 1,q . 

We can also allow p or q to be 1 and the other to be +00, which is consistent 
with (1.60). If p — 1 and q = +00, then the substitute for (1.61) is 

(1.62) f f(x) g(x) dx < ( f f(y) dy) ( sup g(z)) . 

JK V J K ' K zEK ' 

Let us now prove (1.61) when p,q > 1, beginning with some initial reduc- 
tions. The inequality is trivial if / or g is identically 0, or zero "almost every- 
where", since the left side of (1.61) is then equal to 0. Thus we may suppose 
that 

(1.63) (J K f(y) p dy) 1/P and ( J g{zf dz) ^ 

are nonzero. We may suppose further that these expressions are both equal to 
1, because the general case would follow by multiplying / and g by positive 
constants. For any nonnegative real numbers s, t, 

s p t q 

(1.64) st<— + — . 

p q 
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This is a version of the geometric-arithmetic mean inequalities, which can be 
treated as an exercise in calculus, or derived from the convexity of the exponen- 
tial function. Note that the inequality is strict when s p ^ t q . Applying (1.64) 
to s = f(x) and t — g(x) and then integrating in x, we get that 

(1.65) [ f (x) g(x) dx < - [ f(x) p dx + -[g(x) q dx. 

Jk' p Jk q Jk 

This implies (1.61) when the integrals of f p and g q are equal to 1, as desired. 
Similarly, 

n n , / n ~ , 

(1.66) £«a<(E<) (E & 

j = l k=l 1=1 

when oi, . . . , a n , bi,...,b n are nonnegative real numbers and p, q > 1 are con- 
jugate exponents. If p = 1 and q = oo, then this should be interpreted as 

n n 

(1.67) ^2a 3 bj < (^2a^)(max{bi : 1 < I < n}^. 

j=l k=l 

Let / and g be nonnegative functions on an interval K again, and let p be 
a real number, p > 1. Minkowski's inequality states that 

(1.68) ( J (f(x) + g[x)Y dx) <(J k f{x) P dx) ^ + ( J g{xf dx) Vp . 

The analogue of (1.68) for p = +oo is the elementary inequality 

(1.69) sup (f(x) +g(xj) < sup f(x) + sup g(x). 

xeK x£K xeK 

Let us suppose that 1 < p < +oo, since (1.68) is trivial when p = 1. We 
begin with 



(1.70) / {f{x)+g{x)Ydx 
Jk 



f{x)(f{x) + g{x)f- 1 dx+ / g{x){f(x) + g{x)f- 1 dx. 

K JK 

If q > 1 is the conjugate exponent of p, then Holder's inequality implies that 
(1.71) f f(x){f{x)+g{x)Y- 1 dx 

JK 



< 



( J f(y) p dy) ^ (J (f(*) + 9{z)) qip - 1] dz) 11 

f \i-i/p 
m p dy) (J K (f(z) + g(z)rdz) 
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There is an analogous estimate for J K g(x) (f(x) + g(x)) p 1 dx, which leads to 
(1.72)/ (f(x)+g(x)) p dx 

JK 

, 1-1/P 



< 



{ ( L f{y)p dy ) 1/P + ( / 3(y)p dy ) 1/P } ( L {m + g[z))p dz ) 



It is easy to derive (1.68) from this. 

Minkowski's inequality for finite sums can be expressed as 



^(^r+tei"'" 

when 1 < p < co, and 

(1.74) maxjaj + 6j ■ : 1 < j < n} < maxja., : 1 < j < n} + max{bj : 1 < j < n} 

when p = oo, where a\, . . . , a n , bi, . . . ,b n are nonncgativc real numbers. These 
inequalities can be shown in the same way as for integrals. As an alternate 
approach, fix p, 1 < p < oo, since the p = 1 and p — oo cases are easy, and 
suppose for the moment that 



If i is a real number such that < t < 1 , then 

tai + V-Qbrf} 1 '" 
To see this, rewrite (1.75) and (1.76) as 



1/ 

(1.76) (E(tOi + (l-t)M P ) "<1- 



(1.77) £< = £^ = 1 

and 

n 

(1.78) ^ (taj + (i_ t ) 6j)P <i, 

i=i 

respectively. To go from (1.77) to (1.78), it suffices to know that 

(1.79) (t 0j + (1 - t) bj) p <ta p j+(l-t) b p 

for each j, which follows from the convexity of the function <j){x) = x p , x > 0. 
Once one has (1.76) under the assumption (1.75), it is not difficult to de- 
rive (1.73) in the general case. Basically, the parameter t compensates for 
(£?=i <) 1/P and (£; =1 b p y/ p not being equal. 
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Fix a positive integer n, and suppose that {oj}" = i is a finite sequence of 
nonnegative real numbers. Let p and q be positive real numbers, with p < q. 
Clearly 

(1.80) max ^ 

Moreover, 

(i*) (E ?)"^ (£«*)"*• 



because 

(1 . 82 ) Es'<( i;? -«o'"'(i:«f)^i:«f Nl+l " >/p 

j=l - - /=! r=l 



and 1 + (g - p)/p = q/p. 
In the other direction, 



i/p 

<n ,p max a,-, 

l<j<n 



( 1 .83) ^ ^ ^ (l^j ^ L..I l.( LA I < . . 



and 

n i / n i / 

(1-84) (£aj) /P <n(V P )-(V 9 )(^ a? ) /9 . 

The first inequality is trivial, and the second can be rewritten as 



which is an instance of (1.58) applied to (f>(x) = x q / p . 

If < p < 1 and u, v are nonnegative real numbers, then 

(1.86) (u + v) p < u p + v p . 

This is a special case of (1.81), with q = 1 and n = 2. This leads to 

for nonnegative real numbers b\,. . . ,b n and ci, . . . , c n , and 
(1.88) / {f{x)+g{x)Ydx< f f(x)Pdx+ f g(xf dx 

JK JK JK 

for nonnegative functions /, g on an interval K. 



1.3. SOME RELATED INEQUALITIES 13 

Suppose that < p, q, r, < oo and 

111 

(1.89) - = - + -. 

r p q 

If ai, . . . , a n , bi, . . . , b n are nonnegative real numbers, then 

This follows from Holder's inequality. Similarly, for nonnegative functions /, g 
on an interval K, 



(1.91) ( Jj f(x)g(x)Yd x y /r < ( jj{xydx) 1,P (Jjixydx) 1 ' 
One can also allow for infinite exponents in the usual way. 



Chapter 2 

Norms on vector spaces 



In this book, all vector spaces use the real or complex numbers as their under- 
lying scalar field. We may sometimes wish to restrict ourselves to one or the 
other, but frequently both are fine. Let us make the standing assumption that 
all vector spaces are finite-dimensional in this and the next two chapters. 



2.1 Definitions and examples 

Let V be a real or complex vector space. By a norm on V we mean a nonnegative 
real- valued function || • || on V such that = if and only if v is the zero 
vector in V, 

(2.1) IMI = |i|IN 

for every v and t £ R or C, as appropriate, and 

(2.2) + < HI + IIHI 

for every v,w £ V. As a basic class of examples, let V be R™ or C™, and 
consider 

(2.3) IMIp = (EN p ) P 

3 = 1 

when 1 < p < oo, and 

(2.4) IMloo = max 

l<j<n 

The triangle inequality for these norms follows from (1.73) and (1-74). 
If 

(2.5) B l = {v e V : |M| < 1} 

is the closed unit ball corresponding to a norm ||u|| on V, then it is easy to see 
that Bi is a convex set in V. This means that 

(2.6) tv + (1 -t)w G Si 
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whenever v,w E B\ and t is a real number such that < t < 1, which follows 
from (2.1) and (2.2). Conversely, if is a nonnegative real- valued function on 
V such that \\v\\ = if and only if v = 0, ||u|| satisfies (2.1), and the unit ball 
B\ is convex, then one can show ||u|| also satisfies (2.2), and hence that ||u|| is a 
norm on V. In effect, this was mentioned already in Section 1.3, as an alternate 
approach to Minkowski's inequality for finite sums. 

If V is a vector space, and || • || is a norm on V, then 

(2.7) - \\w\\\ < \\v-w\\ 
for every v, w E V. This follows from 

(2.8) |H|<HI + ||«-HI. 

and the analogous inequality with the roles of v and w interchanged. Suppose 
that V = R™ or C" for some positive integer n, which is not a real restriction, 
since every real or complex vector space of positive finite dimension is isomorphic 
to one of these. Let \x\ denote the standard Euclidean norm on R n or C™, which 
is the same as the norm ||x|j 2 in (2.3). One can check that there is a positive 
constant C such that 

(2.9) IMI<cm 

for every v E V, by expanding v in the standard basis for V = R n or C", 
and using the triangle inequality and homogeneity of ||u||. This and (2.7) imply 
that ||u|| is a continuous real- valued function on V, with respect to the standard 
Euclidean metric and topology. Thus the minimum b > of \\v\\ among the 
vectors v E V with \v\ = 1 is attained, by well-known results about continuity 
and compactness, and b > 0. It follows that 

(2.10) &M<NI 

for every v E V, because of the homogeneity property of the norms \\v\\ and \v\. 

2.2 Dual spaces and norms 

Let V be a vector space, real or complex. By a linear functional on V we 
mean a linear mapping from V into the field of scalars, i.e., the real or complex 
numbers, as appropriate. The dual of V is the space of linear functionals on 
V, which is a vector space over the same field of scalars as V, with respect to 
pointwise addition and scalar multiplication. The dual of V is denoted V* , and 
it is well known that V* is also finite-dimensional when V is, with the same 
dimension as V. 

If || • || is a norm on V, then the corresponding dual norm || • ||* on V* is 
defined as follows. If A is a linear functional on V, then 



(2.11) 



||A||* =sup{|AH| :v€V, \\v\\<l}. 
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Equivalently, 

(2.12) \\{v)\ < \\\\\*\\v\\ 

for every v £ V, and ||A||* is the smallest nonnegative real number with this 
property. It is not difficult to verify that || • ||* defines a norm on V*. In 
particular, the fmiteness of ||A||* can be derived from the remarks at the end of 
the previous section. 

For example, suppose that V = R™ or C™ for some positive integer n. We 
can identify V* with R™ or C™, respectively, by associating to each w in R n or 
C™ the linear functional X w on V given by 

n 

(2.13) X w (v)=J2 w ^J- 

3=1 

Let 1 < p, q < oo be conjugate exponents, which is to say that 1/p + 1/q = 1, 
and let us check that ||A„,||* = \\w\\ q is the dual norm for ||w|| = \\v\\ p . 
First, we have that 

(2.14) \X w (v)\<\\w\\ q \\v\\ p 

for all w and v, by Holder's inequality. To show that ||A„,||* = \\w\\ q , we would 
like to check that for each w there is a nonzero v such that 

(2.15) \X w (v)\ = \\w\\ q \\v\\ p . 

Let w be given. We may as well assume that »i/0, since otherwise any v 
would do. Let us also assume for the moment that p > 1, so that q < oo. Under 
these conditions, we can define v by 

(2.16) Vj =wj\ Wj \"- 2 

when Wj ^ 0, and by v 3 - = when w 3 - = 0. Here wj is the complex conjugate 
of Wj, which is not needed when we are working with real numbers instead of 
complex numbers. With this choice of v, we have that 

n 

(2.17) A » = ]>>,f = HI?- 

3=1 



It remains to check that 

(2.18) HpHHir 1 - 

If q = 1, then p = oo, and (2.18) reduces to 



(2.19) max |^| = 1. 

l<j<n 

In this case \vj\ = 1 for each j such that Vj ^ 0, which holds for at least one j 
because w ^ 0. Thus we get (2.19). If q > 1, then \vj\ = | ^ j | 9 — 1 for each j, and 
one can verify (2.18) using the identity p(q — 1) = q. 
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Finally, if p = 1, and hence q — oo, then choose I, 1 < I < n, such that 

(2.20) \wi \ = max \wA = IHIoo- 

l<j<n 

Define v by vi = W/|u>j| _1 and vj = when j ^ I. This leads to 

(2.21) A 10 (") = l«'il = Hloo 
and ||v||i = 1, as desired. 

2.3 Second duals 

Let V be a vector space, and V* its dual space. The dual of V* is denoted V** . 

There is a canonical mapping from V into V** , defined as follows. Let 
v <EV be given. For each A £ 7*, we get a scalar by taking X(v). The mapping 
A i ^ X(v) is a linear functional on V* , and hence an element of V**. Since we 
can do this for every v € V, we get a mapping from V into V**. One can check 
that this mapping is linear and an isomorphism from V onto V** . For instance, 
everything can be expressed in terms of a basis for V. 

Now suppose that we have a norm || • || onV. This leads to a dual norm || • ||* 
on V*, as in the previous section, and a double dual norm || • ||** on V** . Using 
the canonical isomorphism between V and V** just described, we can think of 
|| • ||** as defining a norm on V. We would like to show that 

(2.22) |H|** = ||u|| for every v E V. 

Note that this holds for the p- norms || • || p on R™ and C", by the analysis of 
their duals in the preceding section. 

Let v G V be given. By definition of the dual norm ||A||*, we have that 

(2.23) |A(u)| < ||A||* |M| 
for every A G V*, and hence that 

(2.24) HI" < H. 

It remains to show that the opposite inequality holds, which is trivial when 
v = 0. Thus it suffices to show that there is a nonzero A £ V* such that 

(2.25) X (v) = \\\ Q \\*\\v\\ 
when v ^ 0. 

Theorem 2.26 Let V be a real or complex vector space, and let \\ ■ \\ be a norm 
on V. If W is a linear subspace of V and /i is a linear functional on W such 
that 

(2.27) \ti(w)\ < IMI f or ever V w €W, 

then there is a linear functional fi on V such that ji — p on W and \\fl\\* < 1. 
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The existence of a nonzero Ao G V* satisfying (2.25) follows easily from this, 
by first defining A on the span of v so that Xo(v) = \\v\\, and then extending to 
a linear functional on V with norm 1. 

To prove the theorem, let us begin by assuming that V is a real vector space. 
Afterwards, we shall discuss the complex case. 

Let W and \i be given as in the theorem, and let dimZ be the dimension 
of a linear subspace Z of V. For each integer j such that dim W < j < dim V, 
we would like to show that there is a linear subspace Wj of V and a linear 
functional \ij on Wj such that W C Wj, dimWj = j, fij = fj, on W, and 

(2.28) < IMI for evcr y w G Wj. 

If we can do this with j = dim]/, then Wj — V, and this would give a linear 
functional on V with the required properties. 

Let us show that we can do this by induction. For the base case j = dim W , 
we simply take Wj = W and fij = yU. Suppose that dimVF < j < dimV, and 
that Wj, fj,j are as above. We would like to choose Wj + \ and fj,j+i with the 
analogous properties for j + 1 instead of j. To be more precise, we shall choose 
them in such a way that Wj C Wj+\ and Hj+i is an extension of /ij to Wj+i. 

Under these conditions, Wj is a proper subspace of V, and hence there is a 
z E V\Wj. Fix any such z, and take Wj + i to be the span of Wj and z. Thus 

(2.29) dim Wj+i = dim Wj + l=j + l. 

Let a be a real number, to be chosen later in the argument. If we set fij + \{z) 
equal to a, then is determined on all of VFj+i by linearity and the condition 
that /ij+i be an extension of /jj. Specifically, each w G Wj+i can be expressed 
in a unique way as x + 1 z for some x G Wj and t G R, and 

(2.30) ii j+1 (w)=Hj(x)+ta. 

It remains to choose a so that y,j+i satisfies the analogue of (2.28) for j + 1, 
which is to say that 

(2.31) \/j,j + i(w)\ < \\w\\ for every w G Wj+\. 
Equivalently, we would like to choose a so that 

(2.32) \/j,j(x) +ta\ < \\x + tz\\ for every x G Wj and ( 6 R. 
It suffices to show that 

(2.33) + a \ < \\ x + z \\ for every x G Wj, 

since the case where t — in (2.32) corresponds exactly to our induction hy- 
pothesis (2.28), and one can eliminate t ^ using homogeneity. Let us rewrite 

(2.33) as 

(2.34) — fij(x) — \\x + z\\ < a < —jj,j{x) + \\x + z\\ for every x G Wj. 
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It follows from (2.28) that 

(2.35) fij(x — y) < \\x — y\\ for every x, y G Wj. 
Using the triangle inequality, we get that 

(2.36) fj,j(x — y) < \\x + z\\ + \\y + z\\ for every x, y G Wj, 
and hence 

(2.37) — fj,j(y) — \\y + z\\ < —fj,j(x) + \\x + z\\ for every x, y G Wj. 

If A is the supremum of the left side of this inequality over y G Wj, and B is 
the infimum of the right side of this inequality over x G Wj, then A < B, and 
any a G R such that A < a < B satisfies (2.34). This finishes the induction 
argument, and the proof of Theorem 2.26 when V is a real vector space. 

Consider now the case of a complex vector space V. The real part of a 
linear functional on V is also a linear functional on V as a real vector space, i.e., 
forgetting about multiplication by i. Conversely, if <f> is a real- valued function 
on V which is linear with respect to vector addition and scalar multiplication 
by real numbers, then there is a unique complex linear functional ip on V whose 
real part is <j), given by 

(2.38) ip(v) = <f>(v) -i<t>(iv). 
For any complex number £, 

(2.39) |C| = sup{Re(aC) : a G C, \a\ < 1}. 

If V is equipped with a norm \\v\\, then the norm of a complex linear functional 
A on V can be expressed as 

(2.40) ||A||* = sup{Rc(aA(w)) : v G V,a G C, \\v\\ < 1, \a\ < 1}. 
By linearity, X(av) = a\(v), which implies that 

(2.41) ||A||* = sup{ReA(w) : v G V, \\v\\ < 1}. 

Thus the norm of a complex linear functional on V is the same as the norm 
of its real part, as a linear functional on the real version of V. To prove the 
extension theorem in the complex case, one can apply the extension theorem in 
the real case to the real part of the given complex linear functional on a complex 
linear subspace, and then complexify the real extension to get a complex linear 
extension with the same estimate for the norm. 

2.4 Linear transformations 

Let Vi and Vi be vector spaces, both real or both complex, equipped with norms 
|| • ||i and || • || 2, respectively. Here the subscripts are merely labels to distinguish 
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these norms, rather than referring to the p- norms described in Section 2.1. The 
corresponding operator norm ||T|| op of a linear transformation T from V\ into 
V2 is defined by 

(2.42) ||T|| op = sup{||T(t;)|| 2 : v € V u \\v\U < 1}. 
Equivalently, 

(2.43) IIT^IIa^llTHoplHli for every v e V x , 

and || T || op is the smallest nonnegative real number with this property. The 
finiteness of ||T|| op is easy to check using the remarks at the end of Section 2.1. 

The space £(Vi, V2) of all linear transformations from V\ into V2 is a vector 
space in a natural way, using pointwise addition and scalar multiplication of 
linear transformations, with the same scalar field as for V\ and V2. One can 
verify that the operator norm || • || op on L{V\, V2) is a norm on this vector space. 
Note that the dual V* of a vector space V is the same as C(V, R) or C(V, C), as 
appropriate, and the dual norm on V* associated to a norm on V is the same 
as the operator norm with respect to the standard norm on R or C. 

Suppose that V3 is another vector space, with the same field of scalars as V\ 
and V2, and equipped with a norm || • H3. If Ti : V\ — \ V2 and T2 : V2 — > V3 are 
linear mappings, then the composition T2 o T\ is the linear mapping from V\ to 
V3 given by 

(2.44) (T 2 oT 1 )(v)=T 2 (T 1 (v)). 

It is easy to see that the operator norm of T2 o T\ is less than or equal to the 
product of the operator norms of T\ and T2 with respect to the given norms on 
V U V 2 , &ndV 3 . 

Let V\ and V2 be vector spaces, both real or both complex, and let T be a 
linear transformation from V\ into V2. There is a canonical dual linear trans- 
formation T* : V2 -> V* corresponding to T, defined by 

(2.45) T*(fj,) = n o T for every ^ e V 2 * . 

In other words, if ^ is a linear functional on V2, then ^oTisa linear functional 
on V\ , and T* (p) is this linear functional. If R, T : V\ — > V2 are linear mappings 
and a, b are scalars, then 

(2.46) (aR + bT)* = a R* +bT*. 

If V3 is another vector space with the same field of scalars as V\ and V2, and if 
Ti : V\ — V V2, T2 : V2 — > V3 are linear mappings, then 

(2.47) (T 2 oT 1 )* = T 1 *oJ 1 2 *. 

If T is a linear mapping from V\ to V2 , then we can pass to the second duals 
to get a linear transformation T** : V** — > V 2 ** ■ As in Section 2.3, there are 
canonical isomorphisms between V\ and V**, and between V2 and V" , which 
allow one to identify T** with a linear mapping from V\ to V2. It is easy to see 
that this mapping is the same as T. 
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The identity transformation I = Iy on a vector space V is the mapping that 
takes each v € V to itself, and the dual of Iy is equal to the identity mapping 
Iy* on V*. A one-to-one linear transformation T from V\ onto V 2 is said to be 
invertible, which implies that there is a linear transformation T -1 : V 2 — > V\ 
such that 

(2.48) T- 1 oT = I Vl and T or 1 = Iy 2 . 

One can check that T : Vi -> V 2 is invertible if and only if T* : V 2 -> V\* is 
invertible, in which event 

(2.49) (T" 1 )* = (T*)- 1 . 

If T\ : V\ — > V 2 and T 2 : V 2 — > V3 are both invertible, then their composition 
T2 7i : Vi — > V3 is invertible too, with 

(2.50) (T 2 o Ti)- 1 = Tf 1 o 77 1 . 

Let || • ||i and || • || 2 be norms on Vi and V2 again, and let || • ||| and || • H2 
be the corresponding dual norms on V* and V 2 * ■ We also have the associated 
operator norm || • || op on L(V\, V2), and the operator norm || • || op , on L(V 2 * , V*) 
determined by the dual norms on V* and V 2 * ■ It is easy to see that 

(2-51) IIT'Hop. < ||T|| op 

for each linear mapping T : V\ — > V 2 , directly from the definitions. Using linear 
functionals as in (2.25), one can show that the opposite inequality holds, so that 

(2.52) yx'iiop = IIX" || op* • 

Alternatively, one can get the opposite inequality by applying (2.51) to T* 
instead of T and identifying T** with T. 



2.5 Some special cases 

Let V be R n or C" for some positive integer n, and let || • || p be the norm on V 
described in Section 2.1 for some p, 1 < p < 00. If T is a linear mapping from 
V into V, then we can express T in terms of an n x n matrix (aj.k) of real or 
complex numbers, as appropriate, through the formula 

n 

(2.53) (T(v))j = J2 a j-,kVk- 

fe=i 

Here Vk is the fcth component of v G V, (T(v))j is the jth component of T(v), 
and conversely any n x n matrix (dj t k) of real or complex numbers determines 
such a linear transformation T. Let us write ||T|| oPjM , for the operator norm of 
T with respect to the norm || • || p , used on V both as the domain and range of 
T. These operator norms can be given explicitly when p = 1, 00, by 

n 

(2.54) ||T|| oPill = max ^| 0j -, fc | 

J=l 
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and 

n 

(2.55) ||T|| oPiOCOC = max V \a jik \- 

!<1<n — ' 
- J - k=l 

To see this, let e\, . . . , e„ denote the standard basis vectors of V, so that the 
fcth coordinate of e k is equal to 1 and the other coordinates are equal to 0. The 
right side of (2.54) is the same as 

(2.56) max ||T(e fe )||i. 

l<k<n 

This is obviously less than or equal to ||T|| p.iij by definition of the operator 
norm. The opposite inequality can be derived by expressing any v £ V as a 
linear combination of the e;'s and estimating ||T(i;)||i in terms of the ||T(ej)||i's. 
Similarly, for p = oo, we use the fact that 

(2.57) ||TH|U= max |(TH),| 

l<]<n 

for every w e V, by the definition of the || • ||oo norm. Clearly 

n 

(2.58) \(T(w))j\<J2Kk\ 

k=l 

when w £ V and \\w\\oo < 1, so that 

n 

(2.59) llrHop.oooo < max V" \aj.k\- 

Kj<n '—^ 
- J - k=l 

To get the opposite inequality, one can observe that for each j there is a w £ V 
such that ||u>||oo = 1 and equality holds in (2.58). 

If (a,j t k) happens to be a diagonal matrix, so that a^ k = when j ^ k, then 
H^llop.pp is equal to the maximum of \ajj\, 1 < j < n, for every p. Otherwise, it 
may not be so easy to compute ||T|| oPj pp when 1 < p < oo. A famous theorem 
of Schur states that 

(2-60) \\T \\ op , pp < \\T\\% U \\T\\^X- 

To show this, fix p £ (1, oo), and observe that 

r': 

k=l 1=1 

n 



(2.6i) l(i»),f < EK'II^I" 



1=1 



for each j = l,...,n and v £ V. This uses Holder's inequality or simply 
the convexity of <fr(r) = \r\ p for the first inequality, and (2.55) for the second. 
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Therefore 

n n n 

(2.62) Ei™)^ ^ im&ooEEMH p 

j=i i=i ;=i 



< l|T||^Loori| op ,iiE^I P - 
z=i 

Schur's theorem follows by taking the pth root of both sides of this inequality. 

2.6 Inner product spaces 

Let V be a real or complex vector space. An inner product on V is a scalar- 
valued function (•, •) on V x V with the following properties: (1) for each w £ V, 
v \-> (v, w) is a linear functional on V; (2) if V is a real vector space, then 

(2.63) (w, v) = (v, to) for every v, w G V, 

and if V is a complex vector space, then 



(2.64) (w,v) — (v,w) for every v,w <E V: 

(3) the inner product is positive definite, in the sense that (v, v) is a positive 
real number for every v £ V such that v ^ 0. Note that (v,w) = whenever 
u = or w = 0, and that (v, v) £ R for every w £ V even when V is complex. 

A vector space with an inner product is called an inner product space. If 
(V, (•, •)) is an inner product space, then we put 

(2.65) \\v\\ = (v,v)^ 2 

for every v £ V. The Cauchy-Schwarz inequality says that 

(2.66) \(v, W )\<\\v\\\\w\\ 

for every v,w £ V, and it can be proved using the fact that 

(2.67) (v + aw,v + aw) > 

for all scalars a. One can show that || • || satisfies the triangle inequality, and 
is therefore a norm on V, by expanding \\v + w\\ 2 as a sum of inner products 
and applying the Cauchy-Schwarz inequality. For each positive integer n, the 
standard inner products on R" and C n arc given by 



(2.68) (v,w) =^2vjWj 

3 = 1 

on R" and 

n 

(2.69) {v,w)=Y,VjWj 

3 = 1 
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on C™, and the associated norms are the standard Euclidean norms on R™, C™. 

Let (V, (•, •)) be a real or complex inner product space. A pair of vectors 
v, w E V are said to be orthogonal if 

(2.70) (v,w}=0, 

which may be expressed symbolically by v _L w. This condition is symmetric in 
v and w, and implies that 

(2.71) \\v + w\\ 2 = \\v\\ 2 + \\w\\ 2 . 

A collection vi,...,v n of vectors in V is said to be orthonormal if Vj _L vi when 
j ^ I and \\vj\\ = 1 for each j. In this case, if c\, . . . ,c„ are scalars and 

(2.72) w^ciUiH \-c n v n , 

then 

(2.73) Cj = {w,Vj) 
for each j, and 

n 

(2-74) II^H 2 = E^| 2 - 

An orthonormal basis for is an orthonormal collection of vectors in whose 
linear span is equal to V. For example, the standard bases in R" and C" are 
orthonormal with respect to the standard inner products. 

Suppose that v\, . . . ,v n are orthonormal vectors in V, and define a linear 
transformation P : V — > V by 

n 

(2.75) P(w) = J2(™,v j )v j . 
Observe that 

(2.76) P(w) = w 

when w e V is a linear combination of vi, . . . , v n . If w is any vector in V, then 

(2.77) {P(w),vj) = (w, Vj ) 

for each j, and hence (w — P{w)) _L Vj for each j. Thus (w — P(w)) _L w, and 



(2.78) || W || 2 = HPMII 2 + \\w - P(w)\\ 2 = J2 l<^>| 2 + lh " ^( 



'!')ll 2 . 



Suppose that w is an element of V which is not in the span of Vi,. . . ,v n . 
This implies that w — P(w) 7^ 0, and 

w — P(w) 

(2.79) u — — 



P{w) 
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satisfies ||u|| = 1 and u _L Vj for each j. It follows that v\, ...,v n , u is an 
orthonormal collection of vectors in V whose linear span is the same as the 
span of vi, . . . , v n , w. By repeating the process, we can extend v\, . . . , v n to an 
orthonormal basis of V. In particular, every finite-dimensional inner product 
space has an orthonormal basis. 

The orthogonal complement W 1 - of a linear subspace W of V is defined by 

(2.80) = {v eV : (v,w) =0 for every w G W}, 
and is also a linear subspace of V. Note that 

(2.81) wnw ± = {o}, 

since v _L v if and only if v = 0. Let v\, . . . , v„ be an orthonormal basis for W, 
and let P : V -> V be defined as in (2.75). Thus 

(2.82) P(«) e W and v - P(v) G W 1 - 

for every w G V. If v & V and x,y € W satisfy v — x,v — y G W^, then 
x — y E W P\ W , and hence a; — y = 0. Therefore P(v) is uniquely determined 
by (2.82), and does not depend on the choice of orthonormal basis v\, . . . , v n for 
W . This linear transformation is called the orthogonal projection of V onto W, 
and may be denoted Pw 
Note that 

(2.83) \ w (v) = (v,w) 

defines a linear functional on V for each w G V, and that 

(2.84) lA^t;)! < ||v|| ||w|| 

for every v G V, by the Cauchy-Schwarz inequality. Thus the dual norm of X w 
corresponding to the norm || • || on V is less than or equal to ||tu||. In fact, the 
dual norm of A„, is equal to \\w\\, because X w (w) = \\w\\ 2 . 

Conversely, every linear functional A on V can be represented as X w for some 
id6 V. To see this, let v\, . . . , v n be an orthonormal basis of V. If 

n 

(2.85) w = ^2X(vj)vj, 

3 = 1 

then 

(2.86) (v j ,w) = X(v j ) 

for each j, and hence X(v) = X w (v) for every v G V. It is easy to reverse this 
argument to show that w is uniquely determined by A. 



Remark 2.87 If (V, (•, •)) is a real or complex inner product space, and if || • | 
is the norm associated to the inner product, then there is a simple formula for 
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the inner product in terms of the norm, through polarization. Specifically, the 
polarization identities are 

(2.88) 4{v,w) = \\v + w\\ 2 - \\v-w\\ 2 
in the real case, and 

(2.89) 4 (v,w) = \\v + w\\ 2 - \\v - w\\ 2 + i \\v + iw\\ 2 - i \\v - iw\\ 2 
in the complex case. The norm also satisfies the parallelogram law 

(2.90) \\v + w\\ 2 + \\v - w\\ 2 = 2 (\\v\\ 2 + \\w\\ 2 ) for every v, w € V. 

Conversely, if V is a vector space and || • || is a norm on V which satisfies the 
parallelogram law, then there is an inner product on V for which || • || is the 
associated norm. This is a well-known fact, which can be established using 
(2.88) or (2.89), as appropriate, to define (v,w), and using the parallelogram 
law to show that this is an inner product. 



2.7 Some more special cases 

Let Vi, Vi be vector spaces, both real or both complex, equipped with norms 
I ' II Vn II ' || y 2 : respectively. Suppose first that V\ is R" or C™ for some positive 
integer n, and that || • ||vi is the norm |j • ||i from Section 2.1. Let ei,...,e n 
be the standard basis vectors in V\ , so that the fcth coordinate of e k is equal to 
1 for each k, and the rest of the coordinates are equal to 0. If T is any linear 
mapping from V\ into V2, then 

(2.91) ||T|| op =maxJ|T( efc )||v- 2 . 

This reduces to (2.54) when V2 — V\ with the norm || • ||i from Section 2.1, and 
essentially the same argument works for any norm on any V2. As before, 

(2-92) \\T(e k )\\ V2 < \\T\\ op 

for each k by definition of the operator norm, since has norm 1 in V\ for each 
k, which implies that ||T|| op is less than or equal to the right side of (2.91). To 
get the opposite inequality, one can express any v £ V\ as X)fc=i t 'fc e fc: where 
vi, . . . , v n are the coordinates of v in Vi = R" or C™, and observe that 

n 

(2.93) ||T( V )||v a < ^1^1 W T (e k )\\v 2 < ( max ||T( efc )|| Va ) ||«||i- 
fc=i ™ 

Now let V\ be any vector space with any norm || • ||y i; and let V2 be R" or 
C™ with the norm || • ||oo from Section 2.1. Let T be a linear mapping from 
V\ into Vi again, and let Xj(v) be the jth component of T(v) in R™ or C™, as 
appropriate, for j = 1, . . . , n. Thus Xj is a linear functional on V\, with a dual 
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norm ||Aj||y with respect to the norm || • on V\ for each j. In this case, it 
is easy to see that 

(2.94) ||T|| op = max ||A,||^, 

l<J<n 

using the definitions of the dual norm, the operator norm, and the norm || • |oo 
on V<i. If V\ = V2 equipped with the norm || • |oo from Section 2.1, then (2.94) 
reduces to (2.55), because of the standard identification of the dual of the norm 
I • ||oc on R" or C" with the norm || • ||i from Section 2.1, as in Section 2.2. 
Note that this case is dual to the previous one, in the sense that it can be 
applied to the dual of a linear mapping as in the previous paragraph. Similarly, 
the remarks in the previous paragraph can be applied to the dual of a linear 
mapping as in this paragraph. 



2.8 Quotient spaces 

Let V be a real or complex vector space, and let W be a linear subspace of 
V. The quotient V/W of V by W is defined by identifying v, v' G V when 
v — v' £ W. More precisely, one can define an equivalence relation ~ on V by 
saying that v ~ v' when v — v' G W, and the elements of V/W correspond to 
equivalence classes in V determined by ~. By standard arguments, V/W is a 
vector space in a natural way, and there is a canonical quotient mapping q from 

V onto V/W that sends each v E V to the equivalence class that contains it, 
and which is a linear mapping from V onto V/W whose kernel is W. 

If V is equipped with a norm || • ||, then there is a natural quotient norm 
|| • || q on V/W defined by 

(2.95) \\q(v)\\q = inf{||« + w\\ : w <E W}. 

It is not too difficult to show that this does determine a norm on V/W. More 
precisely, to check that ||g(«)||Q > when v e V\W and hence q(v) ^ in 
V/W, one can use the remarks at the end of Section 2.1, and the fact that 
linear subspaces of R" and C™ are closed with respect to the standard topology 
on those spaces. Note that the operator norm of q is less than or equal to 1 
with respect to the given norm on V and the corresponding quotient norm on 
V/W, at that it is equal to 1 when W =/=V. 

The dual (V/W)* of V/W can be identified with a subspace of V* in a natural 
way. Of course, every linear functional on V/W determines a linear functional 
on V, by composition with the quotient mapping q. The linear functionals on 

V that occur in this way are exactly those that are equal to on W. If A is 
a linear functional on V that is equal to on W and k is a nonnegative real 
number, then the statements 

(2.96) \X(v)\ < k \\v\\ for every v eV 



and 

(2.97) \X(v)\ < k M{\\v + w\\ : w e W} for every v G V 
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are equivalent to each other. This means that the dual norm of A as a linear 
functional on V with respect to || • || is the same as the dual norm of the linear 
functional on V/W that corresponds to A under the quotient mapping with 
respect to the quotient norm on V/W. 

If v is any element of V, then there is a wo G W such that 

(2.98) \\v + w \\ < \\v + w\\ 

for every w G W, so that the infimum in the definition of the quotient norm 
is attained. To see this, we may as well suppose that V = R" or C™ for some 
positive integer n, since every real or complex vector space of positive finite 
dimension is isomorphic to one of these. As in Section 2.1, the norm || • || 
defines a continuous function on R" or C™, as appropriate, with respect to 
the standard Euclidean metric and topology. It is also well known that linear 
subspaces of R" and C" are closed sets with respect to the standard Euclidean 
metric and topology. Remember too that || • || is bounded from below by a 
positive constant multiple of the standard Euclidean metric on R™ or C™, as 
appropriate, as in (2.10). Using this, it suffices to consider a bounded subset of 
W when minimizing \\v + w\\ over w G W. This permits the existence of the 
minimum to be derived from well-known results about minimizing continuous 
functions on compact sets, because closed and bounded subsets of R" and C™ 
are compact. 



2.9 Projections 

Let V be a real or complex vector space, and let U and W be linear subspaces 
of V. Suppose that U n W — {0}, and that every »eF can be expressed as 

(2.99) v = u + w 

for some u G and w G W . If u' G U and w' G W also satisfy v = u' + w', then 

(2.100) u - v! = w' - w, 

and this implies that u — vl and w = w' , because u — u' £ U, w — w' £ W, and 
U n W = {0}. In this case, U and W are said to be complementary in V. 
Consider the mapping P : V — > V defined by 

(2.101) P(v) = u 

for each v G V, where u G U is as in (2.99). It is easy to see that P is a linear 
mapping of V onto U with kernel equal to W, and that 

(2.102) P(u) = u 

for every u G U. Conversely, suppose that P is a linear mapping from V onto 
a linear subspace U of V such that the restriction of P to U is equal to the 
identity mapping on U. If W is the kernel of P, then 

(2.103) v~P(v)eW 
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for every v G V, and it follows that U and W are complementary in V. 
A linear mapping P : V — > V is said to be a projection if 

(2.104) PoP = P. 

This is equivalent to saying that the restriction of P to U = P(V) is the identity 
mapping on U, as in the previous paragraph, so that U is complementary to 
the kernel W of P. If P is a projection on V, then it is easy to see that I — P 
is also a projection on V, where I denotes the identity mapping on V, because 

(2.105) (I - P)o(I - P) = I - P - P + Po P = I - P. 

More precisely, I — P maps V onto the kernel W of P, and the kernel of I — P 
is U = P(V). Of course, orthogonal projections onto linear subspaces of inner 
product spaces are projections in this sense. 

Let U and W be linear subspaces of V again, and let q be the canonical 
quotient mapping from V onto V/W, as in the previous section. It is easy to 
see that U and W are complementary in V if and only if the restriction of q to 
U is a one-to-one mapping from U onto V/W. Suppose that this is the case, 
and let P be the corresponding projection of V onto U with kernel W. 

Let || • || be a norm on V, let || ■ || op be the corresponding operator norm for 
linear mappings on V, and let || • \\q be the corresponding quotient norm on 
V/W. If u e U and w 6 W, then P(u + w) = u, and hence 

(2.106) |M| - \\P(u + w)\\< \\P\\ op \\u + HI- 

This implies that 
(2.107) 

for every u e U. 

Observe that 
(2.108) 

Thus 
(2.109) 

when P =/= 0. If ||P|| 
(2-110) l|9(«)llo = H 

for every u G U . Orthogonal projections onto nontrival subspaces of inner 
product spaces have operator norm equal to 1, for instance. 

Suppose that V = R™ or C™ for some positive integer n, and let I be a 
subset of the set {1, . . . , n) of positive integers less than or equal to n. Let Ui 
be the linear subspace of V consisting of vectors u such that uj = when j £ I, 
and let Wi be the complementary subspace consisting of vectors w such that 
Wj = when j E I. The associated projection Pi of V onto Ui with kernel Wi 
sends v € V to the vector whose jth coordinate is equal to Vj when j G /, and 
to when j £ I. HV is equipped with a norm | • || p as in Section 2.1 for some 



|i , ||»IHI<ll?(«)llQ<IHI 



P||op= HPoP|| OT < ||P 



op ||^ \\op- 



\P\\op > 1 



= 1, then we get that 
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P, 1 < P < 00 > then the corresponding operator norm of Pi is equal to 1 when 
/ ^ 0, so that Pi ± 0. 

Let V be any real or complex vector space with a norm || • || again, and let 
Mi be an element of V such that ||wi|| = 1. As in Section 2.3, there is a linear 
functional A on V such that 

(2.111) A(tti) = 1 

and the dual norm of A with respect to the given norm || • |j on V is equal to 1. 
Under these conditions, one can check that 

(2.112) P(v)=X(v)u 1 

is a projection of V onto the 1-dimensional linear subspace U of V spanned by 
u\ with operator norm equal to 1. 



2.10 Extensions and liftings 

Let V\ and Vi be vector spaces, both real or both complex, and equipped with 
norms. If XJ\ is a linear subspace of V\, and T is a linear mapping from U\ into 
V-i , then it is easy to see that there is an extension T of T to a linear mapping 
from V\ into Vi . The operator norm of T is automatically greater than or equal 
to the operator norm of T, and one would like to choose T so that its operator 
norm is as small as possible. If Pi is a projection from V\ onto U\, then the 
composition T o P 1 is an extension of T to V\ whose operator norm is less than 
or equal to the product of the operator norm of T on U\ and the operator norm 
of P\ on V\. Conversely, if V2 = U\ and T is the identity mapping on U\, then 
an extension of T to a linear mapping from V\ into V2 is the same as a projection 
from V\ onto U\. 

If V2 is 1-dimensional, then this extension problem is equivalent to the one 
for linear functionals discussed in Section 2.3. Similarly, suppose that V2 is R™ 
or C n for some positive integer n, equipped with the norm || • ([qq defined in 
Section 2.1. In this case, a linear mapping T from another vector space into V2 
is equivalent to n linear functionals on the vector space, and the operator norm 
of a T is equal to the maximum of the dual norms of the corresponding n linear 
functionals, as in the second part of Section 2.7. This permits the extension 
problem for T to be reduced to its counterpart for linear functionals again. 

Now let W2 be a linear subspace of V2, and let L be a linear mapping from 
V\ into V2/W2. It is easy to see that there is a linear mapping L from V\ into V2 
whose composition with the canonical quotient mapping q2 from V2 onto V2/W2 
is equal to L, and one would like to choose L so that its operator norm is as 
small as possible. This problem is dual to the extension problem discussed in 
the previous paragraphs. Of course, the operator norm of L is greater than or 
equal to the operator norm of L, with respect to the quotient norm on V2/W2 
that corresponds to the given norm on V2. One way to approach this problem 
is to use a linear subspace U2 of V2 which is complementary to W2, so that the 
restriction of q2 to U2 is a one-to-one linear mapping of U2 onto V2/W2. In this 
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case, one can get a lifting L of L to V2 by composing L with the inverse of the 
restriction of qi to Ui. Conversely, if V\ = V2/W2 and L is the identity mapping 
on V2/W2, then a lifting of L to a linear mapping i from V2/W2 into V2 whose 
composition with q2 is the identity mapping on V2/W2 would map V2/W2 onto 
a linear subspace U2 of V2 which is complementary to W2- 

Suppose that V\ has dimension 1, and let v\ be an element of V\ with norm 
1. Let L be a linear mapping from V\ into V2/W2, and let V2 be an element of 
V2 such that 52(^2) = £( w i) and the norm of V2 in V2 is equal to the quotient 
norm of L{v\) in V2/W2. The existence of v 2 follows from the discussion of 
minimization at the end of Section 2.8. If L is the linear mapping from V\ into 
V2 that sends v\ to v 2 , then L is a lifting of L with the same operator norm 
as L. Similarly, if V\ is R" or C" with the norm || • ||i from Section 2.1, then 
one can get a lifting L of L to a linear mapping from V\ into V2 with the same 
operator norm as L by lifting the n vectors L(ej) in V2/W2 to V2 for each of 
the standard basis vectors ej in R™ or C™ , since the operator norm of a linear 
mapping on V\ may be computed as in the first part of Section 2.7. 

2.11 Minimizing distances 

Let V be a real or complex vector space with a norm || • ||, and let W be a, linear 
subspace of V. If v is any element of V, then there is a wi <G W such that 



for every w G W. This is equivalent to the minimization problem discussed at 
the end of Section 2.8, with w replaced by — w and w\ = —wq 

Suppose for the moment that there is an inner product (•, •) on V for which 
|| • || is the corresponding norm, and let Pw( v ) be the orthogonal projection of v 
onto W, as in Section 2.6. Thus P w {v) G W and v-P w (v) G W^, as in (2.82). 
If w is any element of W, then it follows that Pw{v) — w G W, and hence 



because — PvkC 11 )) -L (^V( w ) — w )- This implies that Pw{v) minimizes the 
distance to v among elements of W in this case, and that Pw(v) is the only 
clement of W with this property. 

Let || • || be any norm on V again, and let || ■ || op be the corresponding operator 
norm for linear mappings on V. If Pi is a projection of V onto W, then the 
kernel of I — Pi is equal to W, and hence 



(2.113) 



v ^ w i\\ < \\ v " w \\ 



(2.114) 



\\v-w\\ 2 = \\v-P w (v)\\ 2 + \\P w (v)-w\\ 2 , 



(2.115) 



llv-PxOOIIHK'-WOII 



= \\(I- Pl )( v - W )\\ 

< \\I - Pi ||op \\V - W 



for every w G W. If ||/ — Pi || p = 1, then it follows that 



(2.116) 



||«-Pi(t;)||<||«-HI 
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for every w E W, so that w\ = P\(v) E W satisfies (2.113). 

Suppose for the sake of convenience now that V = R n or C n for some 
positive integer n, which is not a real restriction, since every real or complex 
vector space of positive finite dimension is isomorphic to one of these. Let E be 
a nonempty subset of V which is closed with respect to the standard Euclidean 
metric and topology on V. If v E V, then there is a w\ EE which minimizes the 
distance to v with respect to the norm || • || on V, in the sense that (2.113) holds 
for every w E E. This follows from the same type of argument using continuity 
and compactness as before. More precisely, although E may not be bounded, 
and hence not compact, it suffices to consider a bounded subset of E for this 
minimization problem. 

Let V be any real or complex vector space again, and let B\ be the closed 
unit ball associated to the norm || • || on V, as in Section 2.1. Let us say that 
B\ is strictly convex if 

(2.117) \\tv + (1 -t)w\\ < 1. 

for every v,w E V with ||u|| = ||w;|| = 1 and v ^ w and every real number t 
with < t < 1. The unit ball in any inner product space is strictly convex, 
as one can show by analyzing the case of equality in the proof of the triangle 
inequality. If V = R" or C™ with the norm || • || p as in Section 2.1 for some p, 
1 < p < oo, then one can check that the unit ball is strictly convex, using the 
strict convexity of the function \r\ p . If n > 2 and p = 1 or oo, then it is easy to 
see that the unit ball is not strictly convex. 

Let E be a nonempty convex set in V, and let v be an element of V. Suppose 
that Wi,W2 E E both minimize the distance to v with respect to || • || in V, in 
the sense that 

(2.118) ||t> - wi|| = \\v - w 2 \\ < \\v - w\\ 

for every w E E. If < t < 1, then w = t w\ + (1 — t) u?2 EE, because E is 
convex. However, if B\ is strictly convex and w\ ^ then the norm of 

(2.119) v-w = t(v-w 1 ) + (l-t)(v-w 2 ), 

is strictly less than the common value of the norms of v — Wi and v — w 2 , 
contradicting (2.118). This shows that w\ = w 2 under these conditions when 
Bi is strictly convex. 

Note that we could simply take t = 1/2 in the preceding argument. If the 
norm on V is associated to an inner product, then the strict convexity property 
of the unit ball with t = 1/2 follows from the parallelogram law (2.90). 



Chapter 3 



Structure 
operators 



of linear 



In this chapter, we continue to restrict our attention to finite-dimensional vector 
spaces. 

3.1 The spectrum and spectral radius 

Let V be a complex vector space with positive dimension, and let T be a linear 
operator from V into V. The spectrum of T is the set of complex numbers a 
such that a is an eigenvalue of T, which is to say that there is an e V such 
that v ^ and 



In this case, v is said to be an eigenvector of T with eigenvalue a. If a is an 
eigenvalue of T, then 



is a nontrivial linear subspace of V, called the eigenspace of T associated to a. 

It is well known that a one-to-one linear mapping R : V — > V automatically 
maps V onto itself, and hence is invertible, because R{V) is a linear subspace 
of V with the same dimension as V. If R is not invertible on V, then it follows 
that R is not one-to-one, so that the kernel of R is nontrivial. By definition, 
a 6 C is an eigenvalue of T when the kernel of T — a I is nontrivial, where I 
denotes the identity transformation on V. Equivalently, a 6 C is not in the 
spectrum of T when T — a I is an invertible linear operator on V. 

A famous theorem states that every linear operator T on V has at least one 
eigenvalue. To see this, note that a £ C lies in the spectrum of T exactly when 
the determinant of T — a I is 0. The determinant of T — a I is a polynomial in 
a, whose degree is equal to the dimension of V. By the "Fundamental Theorem 
of Algebra", det(T — a I) has at least one root, as desired. 



(3.1) 



T(v) = av. 



(3.2) 



{v G V : T(v) ~ av} 
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This argument also shows that the number of distinct eigenvalues of T is less 
than or equal to the dimension of V, since a polynomial of degree n has at most 
n roots. The spectral radius rad(T) of T is defined to be the maximum of |a|, 
where a G C is an eigenvalue of T . Thus T — a I is invertible when \a\ > rad(T), 
and rad(T) is the largest nonnegative real number with this property. 

Let || • || be a norm on V, and let || • || op be the corresponding operator norm 
for linear transformations on V, as in Section 2.4. If a G C is an eigenvalue of 
T, and v G V is a nonzero eigenvector corresponding to a, then 

(3-3) |a||HI<im| op |H|, 

and hence |a| < ||T|| op . It follows that rad(T) < ||T|| op . 
Now let n be a positive integer, and let us check that 

(3.4) rad(T") = rad(T)™. 

If a is an eigenvalue of T, then a n is obviously an eigenvalue of T n for each n, 
and hence 

(3.5) rad(T)" < rad(T"). 

To get the opposite inequality, suppose that /? is an eigenvalue of T n , and let us 
show that a is an eigenvalue of T for some complex number a such that a n = (3. 
Let ai, . . . , a n be the nth roots of /3, so that 

(3.6) z n -0 = (z-a 1 )---(z-a n ). 
This implies that 

(3.7) T n -/3I=(T- ai I)---(T-a n I), 

where the product of linear operators on V is defined by their composition. If 
T — ctj I is invertible on V for each j = 1, . . . , n, then it follows that T n — (3 1 is 
also invertible on V, because the composition of invertible operators is invertible. 
If ft is an eigenvalue of T, then T n — /3 1 is not invertible, and hence T — ctj I 
is not invertible for some j. This says exactly that aj is an eigenvalue of T for 
some j, as desired. 



3.2 Adjoints 

In this section, both real and complex vector spaces are allowed. Let (V, (•, •)) 
be an inner product space. If T is a linear operator on V, then there is a unique 
linear operator T* on V such that 

(3.8) (T(v),w) = (v,T*(w)) 

for every v, w € V, called the adjoint of T. More precisely, for each w G V, 



(3.9) 



H w (v) = (T(v),w) 
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defines a linear functional on V. This implies that there is a unique element 
T*(w) oft/ such that 

(3.10) f i w (v) = (v,T*(w)) 

for every v £ V, as in Section 2.6. One can check that T* is linear on V, using 
the fact that T*(w) is uniquely determined by (3.10). Alternatively, if we fix an 
orthonormal basis for V, then we can express T in terms of a matrix relative 
to this basis. Remember that the transpose of a matrix (ajj) is the matrix 
given by bjj = aij. If V is a real vector space, then T* is the linear 
transformation on V that corresponds to the transpose of the matrix for T with 
respect to the same basis. In the complex case, the entries of the matrix for 
T* are the complex conjugates of the entries of the transpose of the matrix for 
T. It is easy to see that T* is uniquely determined by (3.8), so that different 
orthonormal bases for V lead to the same linear transformation T* when one 
computes T* in terms of matrices. 

Although we are using the same notation here for the adjoint as we did in 
Section 2.4 for dual linear mappings, we should be careful about some of the 
differences. In the real case, we can identify V with its dual space V*, since 
every linear functional on V can be represented as 

(3.11) \ w {v) = (v,w) 

for some w £ V. In this case, it is easy to see that the adjoint of T corresponds 
exactly to the dual linear transformation defined previously. However, in the 
complex case, the mapping from w £ V to the linear functional X w £ V* is 
not quite linear, but rather conjugate-linear, in the sense that multiplication 
of w by a complex number a corresponds to multiplying X w by the complex 
conjugate a of a. Thus the adjoint of T is not quite the same as the dual linear 
transformation defined earlier in the complex case, which is also reflected in the 
linearity properties of the mapping from T to T* discussed next. 

Note that I* = I, where I is the identity transformation on V. If S and T 
are linear transformations on V and a, b are scalars, then 

(3.12) (aS + bT)* = aS* + bT* 
when V is a real vector space, and 

(3.13) (aS + bT)* =aS* + bT* 
when V is a complex vector space. Also, (T*)* = T. and 

(3.14) (ST)*=T*S*. 
If T is invertible, then T* is invertible, and 

(3.15) (T*)- 1 = (T- 1 )*. 

It follows that T — A I is invertible if and only if T* — A I is invertible for each 
A G R in the real case, and similarly that T — A I is invertible if and only if 
T* — XI is invertible for each A £ C in the complex case. 
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Let || • || be the norm on V associated to the inner product (■,■), and let 
|| • || p be the corresponding operator norm for linear transformations on V with 
respect to || • ||. Let us check that 

(3.16) \\T\\ op = sup{\(T(v),w)\:v,wEV, ||v||, ||w|| < 1> 

for any linear transformation T on V. The right side of (3.16) is clearly less than 
or equal to the operator norm of T, because of the Cauchy-Schwarz inequality. 
To get the opposite inequality, one can take w = T(v)/\\T(v)\\ in the right side 
of (3.16) when T(v) ^ 0. It follows that 

(3-17) ||T*|| op =pl op , 

because the right side of (3.16) is equal to the analogous quantity for T*. 



3.3 Self-adjoint linear operators 

Let (V, (•, •)) be a real or complex inner product space, as in the previous section. 
A linear transformation A on V is said to be self-adjoint if 

(3.18) A* = A. 
This is equivalent to the condition that 

(3.19) (A(v),w) = (v,A(w)) 

for every v, w E V. Thus the identity operator / on V is self-adjoint. 

Let W be a linear subspace of V, and let Pw be the orthogonal projection 
of V onto W, as in Section 2.6. Remember that Pw{ v ) is characterized by the 
conditions P\v(v) E W and v — P\y( v ) £ W ± , for each v EV. This implies that 

(3.20) (Pw(v),w) = (P w (v),P w (w)) = (v,P w (w)) 

for every v,w E V, and hence that Pw is self-adjoint on V. 

If A and B are self-adjoint linear operators on V, then their sum A + B is 
self-adjoint. Similarly, if A is a self-adjoint linear operator on V and t is a real 
number, then t A is self-adjoint as well. Note that it is important to take t G R 
here, even when V is a complex vector space. 

If A is a self-adjoint linear operator on V and V is complex, then it is easy 
to see that 

(3.21) (i(»),»)eR 

for every v E V. Using this, one can check that the eigenvalues of A are also 
real numbers. 

Suppose that A is a self-adjoint linear operator on a real or complex inner 
product space (V, (•, •)), and that v E V is an eigenvector of A with eigenvalue 
A. If y E V and y _L v, then 

(3.22) (v, A{y)) = (A(v), y) = A (v, y) = 0, 
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so that A(y) _L v. If w G V is an eigenvector of A with eigenvalue [i^ \ then 

(3.23) A(v,w) = {Xv,w} = (A(v),w) 

= (v,A(w)) = (v,fiw) = (J,{v,w), 

which implies that (v,w) = 0. 

If the dimension n of V is positive, then one can use an orthonormal basis of 

V to show that V is isomorphic to R™ or C™ with its standard inner product, and 
we may as well take V = R™ or C™ for the moment. By the usual considerations 
of continuity and compactness, (A(v) 7 v) attains its maximum and minimum on 
the unit sphere 

(3.24) {v e V : \\v\\ = 1}. 

It is well known that the critical points of (A(v), v) on the unit sphere are exactly 
the eigenvectors of A with norm 1, and hence that the maximum and minimum 
of (A(v), v) on the unit sphere are attained at eigenvectors of A. In particular, 
A has a nonzero eigenvector v, and A maps 

(3.25) W = {w eV : (w,v) =0} 

to itself, as in the previous paragraph. By repeating the process, one can show 
that there is an orthonormal basis of V consisting of eigenvectors of A. 

A linear operator T on a complex inner product space V is said to be normal 
if T and T* commute, which is to say that 

(3.26) ToT*=T*oT. 

If T can be diagonalized in an orthonormal basis, then T* is diagonalized by the 
same basis, and T is normal. Conversely, one can show that a normal operator 
T on V can be diagonalized in an orthonormal basis, as follows. Any linear 
operator T on V can be expressed as A + iB, where 

(3.27) A = — and B = 

v 7 2 2i 

are self-adjoint. Thus A and B can each be diagonalized in an orthonormal basis 
of V, and one would like to show that they can both be diagonalized by the same 
orthonormal basis when T is normal, which implies that A and B commute. If 
A and B are commuting linear transformations on any vector space, then it is 
easy to see that the eigenspaces of A are invariant under B. To diagonalize T in 
an orthonormal basis, one can first use a diagonalization of A to decompose V 
into an orthogonal sum of eigenspaces of A, and then diagonalize the restriction 
of B to each of the eigenspaces of A. 

A linear operator T on V is said to be an orthogonal transformation when 

V is real, or a unitary transformation when V is complex, if 



(3.28) 



(T(v),T(w)) = (v,w) 
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for every v,w € V. This implies that 

(3.29) \\T(v)\\ = \\v\\ 

for every v e V, and the converse holds because of polarization, as in Remark 
2.87. This condition obviously implies that the kernel of T is trivial, and hence 
that T is invertible, because V is supposed to be finite-dimensional. More 
precisely, it is easy to see that T is orthogonal or unitary, as appropriate, if and 
only if T is invertible and 

(3.30) T -i =T * 

In particular, unitary operators are normal, because T automatically commutes 
with T _1 . 



3.4 Anti-self-adjoint operators 

Let (V, (•, •)) be a real or complex inner product space, as before. A linear 
operator R on V is said to be anti-self-adjoint if 

(3.31) R* = -R, 
which is equivalent to asking that 

(3.32) (R(v),w) = -(v,R(w)) 

for every v, w 6 V. If V is a complex vector space, then R is anti-self-adjoint if 
and only if R = iB for some self-adjoint linear operator B on V. Note that the 
sum of two anti-self-adjoint linear operators on V is also anti-self-adjoint, as is 
the product of an anti-self-adjoint linear operator and a real number. 

If V is a real inner product space and R is an anti-self-adjoint linear operator 
on V, then 

(3.33) (R(v),v) = -(v,R(v)) = -(R(v),v) 

for every v £ V, using the symmetry of the inner product on V in the second 
step. This implies that 

(3.34) (R(v),v}=0 

for every v <E V, and hence that any eigenvalue of R must be equal to 0, if there 
is one. The analogous argument in the complex case would only give that 

(3.35) (R(v),v) 

is purely imaginary for each v 6 V when R is anti-self-adjoint, and hence that the 
eigenvalues of R are purely imaginary, which also follow from the representation 
of R as iB for some self-adjoint linear operator B on V. As in the case of self- 
adjoint operators, if v, w £ V satisfy R(v) — and v ±w, then it is easy to see 
that v _L R(w) too. 

If R is an anti-self-adjoint linear operator on V , then 



(3.36) {R 2 {v),w) = -{R{v),R{w)) = (v, R 2 (w)) 
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for every v, w G W. This shows that R 2 = RoR is a self-adjoint linear operator 
on V, and in particular that R 2 can be diagonalized in an orthonormal basis in 
V, as in the previous section. If we take v ~ w in (3.36), then we get that 

(3.37) (R 2 (v),v) = -{R(v),R(v)) = -\\R(v)\\ 2 

for every v G V. Of course, R(v) = implies that R 2 (v) = R(R(v)) = 
trivially, and (3.37) shows that R 2 (v) — implies that R(v) = in this case. 
If T is any linear operator on V, then T can be expressed as 

(3.38) T = A + R, 

where A = (T + T*)/2 is self-adjoint, and R = (T-T*)/2 is anti-sclf-adjoint. If 
T commutes with T* , then A commutes with R, and hence A commutes with R 2 . 
As in the previous section, one can show that there is an orthonormal basis of V 
in which A and R 2 are simultaneously diagonalized under these conditions. One 
also gets that the eigenspaces of A are invariant under R, because A commutes 
with R. In particular, these remarks can be applied to the case of an orthogonal 
linear transformation T on a real inner product space V. 

3.5 The C*-identity 

Any linear operator T on a real or complex inner product space V satisfies 
(3-39) ||T*T|| op <||T*|| op ||T|| op =||T||^, 
using (3.17) in the second step. Conversely, 

(3.40) ||7»|| 2 = <T»,1») = (T*(T(v)),v) 
for every v G V, and hence 

(3.41) ||r(t;)|| 2 <||(T*r)(T;)|||| W ||<||T*r|| op ||t;|| 2 . 
Thus \\T\\ 2 op < \\T* T\\ op , which implies that 

(3-42) ||T*T|| op =||r|| 2 p . 

This is known as the C* -identity. 

Let (Vi, (•, -)i), (V2, (•, ^2) be inner product spaces which are both real or 
both complex, and with norms || • ||i, || • H2 associated to their inner products, 
respectively. Let ||T|| oP)( ,6 be the operator norm of a linear mapping T : V a — > V& 
using || • || on the domain and || • ||(, on the range, where a,b — 1,2. This can 
be characterized in terms of inner products by 

(3.43) ||r||o Pi06 = sup{|<r(«),t«)6|:i;eV„,«;GV r 6 , ||«||„, ||w|| b < 1}, 



as in (3.16). 
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If T is a linear mapping from V\ into V 2 , then there is a unique linear mapping 
T* : V 2 ->• Vi such that 

(3.44) (T(v),w) 2 = (v,T*(w)) 1 

for every v G Vi and w G V2, again called the adjoint of T. As before, 

(3.45) = (T(v),w) 2 

defines a linear functional on V\ for each w G V 2 , which can be represented as 

(3.46) f i w {v) = (v,T*{w)) 1 

for a unique element T*(w) of Vi, and one can check that T* : V 2 — > Vi is linear. 
Otherwise, one can get T* using orthonormal bases for V\ and V2, with respect 
to which the matrix for T* is equal to the transpose or the complex conjugate of 
the transpose of the corresponding matrix for T, depending on whether V\, V 2 
are real or complex vector spaces. As in Section 3.2, the adjoint of T : V\ — > V 2 
is very similar to the dual linear mapping discussed in Section 2.4, but there are 
some differences, especially when V\ and V 2 are complex. 

If T : Vi — > V 2 is a linear mapping and if a is a real or complex number, as 
appropriate, then 

(3.47) (oT)* = ar 
in the real case, and 

(3.48) {aT)* =aT* 

in the complex case. If S, T : V\ — > V 2 are linear mappings, then 

(3.49) (S + T)* = S* +T*. 

It is easy to see that (T*)* = T for every T : Vi ->■ V 2 . If Vi, V 2 , and V3 are inner 
product spaces, all real or all complex, and if T\ : Vi — > V 2 and T 2 : V 2 — >• V3 
are linear mappings, then 

(3.50) (T 2 oTi)*=T 1 *oT 2 * 

as linear mappings from V3 into V\ . A linear mapping T : Vi — > V 2 is invertible 
if and only if T* : V 2 — > Vl is invertible, in which case 

(3.51) (T- 1 )* = (T*)- 1 . 
Using (3.43), we get that 

(3.52) ||T*||op,2i = ||T'||op,i2, 

for any linear mapping T : V\ — > V2, as in (3.17). The C*-idcntities 
(3-53) r*T||o P ,ii = ||TT*|| OJ) , 22 = ||T||^ 12 

can be verified in this setting as well. 
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3.6 The trace norm 

Let V\ and V2 be vector spaces, both real or both complex, equipped with norms 
|| • ||i and || • || 2 , respectively. Let V{, V 2 * an d II • II 1, II ■ 1 2 be the corresponding 
dual spaces and norms, as in Section 2.2. 

Any linear mapping T : V\ — >■ V2 can be expressed as 

N 

(3.54) 1>H]TA>K-, 

where TV is a positive integer, Ai, . . . , Ajv € V* , and w\ , . . . , Wn e V 2 . The trace 
norm of T relative to || • ||i and || • || 2 is defined to be the infimum of 

N 

(3-55) ElNliHIh 

j=i 

over all such representations of T, and is denoted ||T|| tr , or ||T|| tr) i2 to indicate 
the role of the norms || • ||i, || • || 2 . 

If A e Vi, w G V 2 , and A{v) = \{v) w, then 

(3-56) PI|o P a 2 = ||A|| 1 || W || 2 , 

where ||^4|| op 12 is the operator norm of A with respect to the norms || • ||i and 
II • lb- Thus ' 

N 

(3-57) ||T|| op , 12 <£||A^KIl2 
for each representation (3.54) of T, and therefore 



(3-58) Uriels < ||T| 



tr,12- 



Using this, one can check that the trace norm is a norm on the vector space of 
linear mappings from V\ to V2. If A(v) = \(v)w, where A e V{ and w e V2, 
then 

(3.59) ||A||n| W ||2-P|| op ,i2<PI| t ,,i2<l|A||il| W ||2, 

and hence the operator and trace norms of A are the same. 

Suppose that V 3 is another vector space which is real or complex depending 
on whether V\, V2 are real or complex, and that || • H3 is a norm on V3. If 
Ti : V\ — > V2, T 2 : V2 — > V3 are linear mappings, then 

(3.60) llraoTill^ia < ||Ti|| op ,i 2 ||T 2 || tr ,23 
and 

(3.61) ||T 2 o ^11 

tr,13 ^ ||^l||tr,12 11^2 ||op,23- 

Here || • || op ,o6 an d || ■ \\tr,ab are the operator and trace norms for linear mappings 
from V a to Vb, a,b = 1,2,3. This follows by converting representations of the 
form (3.54) for T\ or T 2 into similar representations for T\ o T 2 . 
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Let us briefly review the notion of the trace of a linear mapping. Fix a 
vector space V, and suppose that A is a linear mapping from V to itself. Let 
vi,...,v n be a basis for V, so that every element of V can be expressed as a 
linear combination of the Vj's in exactly one way. With respect to this basis, 
A can be described by an n x n matrix (aj,fe) of real or complex numbers, as 
appropriate, through the formula 

n 

(3.62) A(v k ) = J2 a J,kVj- 
The trace of A is denoted tr A and defined by 

n 

(3.63) trA y^u,,,. 

3 = 1 

Clearly tr A is linear in A, and one can check that 

(3.64) tr (A o B) = tr (B o A) 

for any linear transformations A and 5 on V. In particular, 

(3.65) tr(To AoT 1 ) = trA 

for every invertible linear transformation T on V. This implies that the trace 
does not depend on the choice of basis for V. 

Suppose further that V is equipped with a norm || • ||, and let || • || tr be the 
corresponding trace norm for operators on V. If A is any linear transformation 
on V, then 

(3.66) \trA\ < \\A\\ tr . 
To prove this, it suffices to show that 

N 

(3.67) |trA|<£||A,|r||«;,|| 



i=i 



whenever Ai,...,Ajv S V* , w\,...,Wn £ V, and A(v) — J2iLi ^i( v ) w i- By 
linearity, it is enough to check that 

(3.68) |trA| < ||A||* ||w|| 

when A e V*, w e V, and A(v) = X(v)w. In this case, tr A = X(w), and 
|A(u>)| < || A ||* 1 1 || by definition of the dual norm. 

Let us return to the setting of two vector spaces V\, V2, with norms || • ||i, 
I • || 2 . If Ti : V\ — > V2 and T2 : V2 — > V\ are linear mappings, then 



(3.69) 



tT Vl (T 2 oT 1 )=tT V2 {T 1 oT 2 ). 
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Here the trace on the left applies to linear operators on V\ , and the trace on the 
right applies to linear operators on V2, as indicated by the notation. By (3.66), 

(3.70) |tr Vl (r 2 ori)| = |trv 2 (Tior 2 )| 

is less than or equal to the trace norm of the composition of T\ and T 2 in either 
order. Hence it is less than or equal to the product of the trace norm of T\ and 
the operator norm of T 2 , or the operator norm of T± times the trace norm of T 2 . 

Remember that £(Vi , V 2 ) denotes the vector space of linear mappings from 
V\ into V2 ■ If R is a linear mapping from V2 into V\ , then 

(3.71) T ^tr Vl (RoT) 

is a linear functional £{Vi, V 2 ). This defines a linear isomorphism from £(V 2 , V\) 
onto C{V U V 2 )*. 

Fix a linear mapping R : V2 — > V\, and let us check that the dual norm of 

(3.71) with respect to the trace norm on C(Vi, V 2 ) is equal to |i?|| op ,2i- We have 
already seen that 

(3.72) |tr Vl (i2oT)| < \\R\\ op ,2i \\T\\t r , 12, 

which says exactly that the aforementioned dual norm of (3.71) is less than or 
equal to ||i?|| p,2i- To establish the opposite inequality, let A € V* and w e V2 
be given, and put T (v) = X(v) w. Thus T is a linear mapping from V\ to V2, 

(3.73) (R o T )(v) = X(v) R(w), 
and 

(3.74) tr Vl (RoT ) = \(R(w)). 

By definition, |tryi (i?oT )| is less than or equal to the dual norm of (3.71) times 
the trace norm of T . The trace norm of T is equal to ||A||* ||w|| 2 , and hence 
\\(R(w))\ is less than or equal to the dual norm of (3.71) times ||A||* ||u>|| 2 - Since 
A e V* and w G V2 are arbitrary, this implies that ||-R|| p,2i is less than or equal 
to the dual norm of (3.71), so that the two are the same. 
Similarly, if T is a linear mapping from V\ into V2 , then 

(3.75) R >->■ tr Vl (R o T) 

is a linear functional on C(V2,V\), and this defines a linear isomorphism from 
£(Vi,V2) onto C(V2,Vi)*. It follows from the previous discussion that the 
dual norm of (3.75) with respect to the operator norm on C(V2,Vi) is equal 
to ||T||tr,i2, by the results in Section 2.3. In other words, we just saw that 
the operator norm on C(V2,Vi) corresponds to the dual of the trace norm on 
£(\q,V 2 ), and this implies that the dual of the operator norm on C{V2,V\) 
corresponds to the trace norm on £(Vi, V 2 ), as in Section 2.3. 
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3.7 The Hilbert-Schmidt norm 

Let (Vi, (•, (V2, (•, -)2) be inner product spaces, both real or both complex, 
with norms || • ||i, || • H2 associated to their inner products, as usual. If {aj}j =1 , 
{bk} q k =i are orthonormal bases for V\, V2, respectively, then 

p Q 

(3.76) v = y^(v,aj)iaj, w = E( w ' b k )i b k 

i=i fe=i 

for every v G V\, w € V2, and we can express a linear mapping T : Vi — > V2 as 

(3.77) T(v) = J2 E^' fl j)i < T ^)' 6 ^ 

j=i fe=i 

Because {aj}^ =1 and {i>fc}fc =1 are orthonormal bases for Vi and V2, we get that 

(3.78) Eii T (^)ii2 - EEi^koam 2 
j=i j=i fe=i 

= EE i(«,^*(^))ii 2 -E 
j=i fe=i /s=i 

The Hilbert-Schmidt norm of T is denoted ||T||h5 and defined to be the square 
root of the common value of these sums. It follows that the Hilbert-Schmidt 
norms of T and T* are the same, and do not depend on the particular choices 
of orthonormal bases for V\ and V 2 ■ 
If we express T(v) as 

(3.79) T(v) = j2(v,ajhT( aj ), 
and (T* o T)(v) as 

p p 

(3.80) (T* o T)(v) =^( ¥j )i (T(aj),T(a,))i a u 

3=1 1=1 

then we see that 

(3.81) tr Vl (T*oT) = j2\\T(a,)\\i = \\n 2 HS- 

3=1 

Here the left side is the trace of T* o T as an operator on V\ , and similarly the 
trace of T o T* on Vi is equal to HTjl^g. One can check that 

(3.82) (A, B) c{Vl y 2) = tr Vl (B* o A) = try, (A o B*) 
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defines an inner product on the vector space C(V\, V2) of linear transformations 
from Vi into V 2 , so that the Hilbert-Schmidt norm is exactly the norm on 
£(Vi,V2) associated to this inner product. More precisely, if R is a linear 
transformation on a real or complex inner product space V, then 

(3.83) try iT = tr v R 
in the real case, and 

(3.84) try iT = tr^R 

in the complex case, as one can verify using the description of the matrix of 
R* with respect to an orthonormal basis for V mentioned in Section 3.2. This 
implies that (3.82) satisfies the symmetry property required to be an inner 
product. 



3.8 Schmidt decompositions 

Let (Vi,(-,-)i), (V2,(-,-)2) be inner product spaces again, both real or both 
complex, with norms || ■ ||i, || • H2 associated to their inner products. A Schmidt 
decomposition for a linear mapping T : V\ — > V 2 is a representation of T as 

r 

(3.85) T(v) = J2*j(v,u j )iw j , 

i=i 

where r is a positive integer, ui,...,u r and wi,...,w r are orthonormal vectors 
in Vi and V 2 , respectively, and Ai , . . . , X r are scalars. The existence of a Schmidt 
decomposition uses the fact that T* o T is self-adjoint on Vi, and hence can be 
diagonalized in an orthonormal basis. It also uses the observation that 

(3.86) T(y) JL T(z) 

in V2 when y, z € Vi, y _L z, and y is an eigenvector for T* o T . 

If T has a Schmidt decomposition (3.85), then it is easy to see that 

(3.87) ||T|j op = max(|A 1 |,...,|A r |), 
and 



(3-88) \\T\\HS=(£,\\ j \ 2 ) X/1 

3 = 1 



Let us check that 

(3.89) im|tr = E|A. 



Clearly 

r 

(3.90) \\T\\tr <J2\ X i\> 
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by the definition of the trace norm, and 

(3.91) i>il=El<TKW 2 |. 

Let yi , . . . , yk <E Vi and Z\ , . . . , z k e V 2 be arbitrary orthonormal collections 
of vectors, and let us check that 

k 

(3-92) EKfl(w).*J>2|< Mtr 

for every linear mapping R : V\ — > V 2 - If R is of the form R(v) = (v,a)\b for 
some a and b e V2, then 

fc fc 
(3.93) E|(i?(y0,^) 2 |-El^' fl )ill^ z ')2l < Il«llill & ll2, 

using the Cauchy-Schwarz inequality in the second step. Thus the left side of 

(3.92) is less than or equal to the operator norm of R when R has rank 1, which 
implies (3.92) in general. If T has Schmidt decomposition (3.85), then we can 
apply (3.92) with R = T, yi = ui, and zi = wi, to get that the right side of 
(3.91) is less than or equal to the trace norm of T, as desired. 



3.9 S p norms 

Let (Vi,(-,-)i), (V2,(-,-)2) be inner product spaces again, both real or both 
complex, with norms |j • ||i, |j • || 2 associated to their inner products, and let 
p be a real number, l<p<oo. If T is a linear mapping from V\ to V2 
with Schmidt decomposition (3.85), and yi,...,yk S V\, Zi,...,Zk G Vi are 
orthonormal collections of vectors, then 



(3.94) (Ei< T w>^i p ) 1/P <(Ei^r 

Equivalently, 



h=l 



3=1 



i/p 



k r V \j T 

(3.95) (EE A <<M>i(<m)2 ) P <(El A ^'l P ) 



i/p 



h=l 1=1 

To see this, observe that 



(3.96) 



1/2 

(El(y^^)il 2 ) <\\vh\\i = i 



1=1 
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and 

(3-97) (^|(^,z,) 2 | 2 ) 1/2 <||^|| 2 = l 



for each h, and that 

(3-98) (^|(2M)i| 2 ) 1/2 < ||«i||i = l, 



and 



(3.99) (jr\( Wl ,z h ) 2 \*y /2 <\\ Wl h = l 

h=i 



for each I. Hence 

r 

(3.100) (wi,z h ) 2 \ < 1 



k 



for each h, and 
(3.101) ^2\{yh,ui)i(wi,z h ) 2 \<l 

h=l 

for each i, by the Cauchy-Schwarz inequality. The desired estimate (3.95) can 
now be derived from Schur's theorem in Section 2.5. 
The S p norm of T is denoted ||T||s p and defined by 



(3-102) imis P = (E|A j | p )' 



1/p 

\*i\ p J 

' 3 = 1 

when 1 < p < oo, and 

(3.103) ||T|| 5oo =max(|A 1 |,...,|A r |). 

This is equal to the trace norm of T when p = 1, the Hilbert-Schmidt norm of 
T when p = 2, and the operator norm of T when p = oo. Equivalently, 



(3.104) ||T|| Sp =sup(^|(T( y ,),z,) 2 r) 1/i " 

h=i 



when 1 < p < oo, where the supremum is taken over all orthonormal collections 
of vectors yi, ■ ■ ■ ,yk and z\, . . . , Zk in V\ and V2, since (3.94) shows that the 
supremum is attained by the Schmidt decomposition. Using (3.104), one can 
check that the S p norm satisfies the triangle inequality, and hence is a norm on 
the vector space C{V\, V2) of linear mappings from V\ into V 2 . 
Similarly, if p > 2 and y\ , . . . , yt- £ V\ are orthonormal, then 

k 

(3.105) (T,\\ T ^)\\2) 1/P <\\n Sp 

h=l 
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for every linear mapping T : V\ — > V2- To see this, let (3.85) be a Schmidt 
decomposition for T, and observe that 

(3.106) ||T( y ,)|| 2 = (^|A ; | 2 |(y, l ,^)i| 2 ) 1/2 

i=i 

for each h. If q = p/2 and fij = \\j\ 2 , then (3.105) can be re-expressed as 
,3.107) (t(±^ k ,u dl f)y<(±4)"\ 

h=l 1=1 j=l 

The orthonormality of yi, . . . , yk and u\, . . . , u r in V\ imply that (3.96) holds for 
each h and that (3.98) holds for each I, as before. The desired estimate again 
follows from Schur's theorem in Section 2.5. 



3.10 Duality 

Let (Vi,(-,-)i), (V2,(-,-)2) be inner product spaces again, both real or both 
complex, and with norms || • ||i, || • ||2 associated to their inner products. Let 
T be a linear mapping from V\ into V2, and let R be a linear mapping from V2 
into V\. Suppose that T has Schmidt decomposition (3.85), so that 

r 

(3.108) (R T)(v) = , £ i \ j {v,u j ) 1 R(w j ) 

3=1 

for each v G V\ . Thus 

r 

(3.109) tv Vl (RoT) = ^2\ j (R(w j ),u j ) 1 . 

3=1 

If 1 < p, q < 00 are conjugate exponents, then we get that 

(3.110) \tv Vl (RoT)\ < (£\\i\ p ) 1/P (^KRiw^UiUl*) 1 '* , 

3=1 3=1 

by Holder's inequality. This implies that 

(3.111) Itrvi^oTJI^IITHsJIflHs,, 

using the definition of the S p norm of T in terms the Schmidt decomposition, 
and the analogue of (3.104) for R and q. This also works when p — 00 or q = 00, 
by the same argument. 

The preceding inequality implies that 

(3.112) T ^tr Vl {RoT) 
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has dual norm less than or equal to ||i2||s with respect to the norm ||T||,s on 
£(Vi, V2) when 1 < p, q < 00 are conjugate exponents. To show that the dual 
norm is equal to ||.R||s 9 , it suffices to check that 



for some T^O. Let us begin this time with a Schmidt decomposition 



for R, where u\,...,u r and uii,...,w r be orthonormal vectors in V\ and V2, 
respectively, and /zi, . . . are real or complex numbers, as appropriate. Let 
us also restrict our attention now to linear mappings T from V\ into V2 with 
Schmidt decomposition (3.85), using the same orthonormal vectors m, . . . , u r 
and wi,...,w r as for R. In this case, the trace of R o T reduces to 



(3.113) 



tT Vl (RoT) = \\T\\ Sp \\R\\s q 



r 



(3.114) 




k 



(3.115) 




and for any fi\,. ..,fik one can choose Ai, . . . , Afe, not all equal to 0, such that 
(3.113) holds, as in Section 2.2. 



Chapter 4 

Seminorms and sublinear 
functions 



As in the previous two chapters, we continue to restrict our attention to finite- 
dimensional vector spaces in this chapter. 



4.1 Seminorms 

A seminorm on a real or complex vector space V is a nonnegative real-valued 
function N on V such that 

(4.1) N(tv) = \t\N(v) 

for every v GV and t e R or C, as appropriate, and 

(4.2) N(v + to) < N(v) + N(w) 

for every v,w GV. Note that N(0) = 0, as one can see by applying (4.1) with 
t = 0. Thus a seminorm N on V is a norm when N(v) > for every v £ V with 
If A is a linear functional on V, then 

(4.3) N x (v) = \X(v)\ 

is a seminorm on V , and the sum and maximum of finitely many seminorms on 
V are also seminorms on V. 

If N is a seminorm on V and v, w G V, then 

(4.4) AT(u) - N(w) < N(v - w) 
and 

(4.5) iV(io) -N(v) < N(w - v) = N(v- w), 
by the triangle inequality. Hence 

(4.6) \N(v)-N(w)\<N(v-w). 
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Suppose for the moment that V = R™ or C™ for some positive integer n, and 
let \v\ be the standard Euclidean norm on V. It is easy to see that there is a 
nonnegative real number C such that 

(4.7) N(v)<C\v\ 

for every v G V, by expressing v as a linear combination of the standard basis 
vectors for V and using (4.1) and (4.2). This together with (4.6) implies that 
N is a continuous function on V with respect to the standard Euclidean metric 
and topology. 

Suppose that V is a real vector space, N is a seminorm on V, and A is a 
linear functional on V such that 

(4.8) X(v) < N(v) 
for every v G V. This implies that 

(4.9) -X(v) = X{-v)<N(-v)=N(v) 
for every v G V, and hence that 

(4.10) \X(v)\<N(v). 

Similarly, if V is a complex vector space, TV is a seminorm on V, and A is a 
linear functional on V such that 

(4.11) RcX(v)<N(v) 
for every v G V, then we get that 

(4.12) RetX{v) = RcX{tv) < N(tv) = N(v) 

for every v G V and ( 6 C with \t\ — 1. This implies again that (4.10) holds for 
every v G V. 

4.2 Sublinear functions 

A sublinear function on a real or complex vector space V is a real- valued function 
p{v) on V such that 

(4.13) p(tv) = tp(v) 

for every v G V and nonnegative real number t, and 

(4.14) p(v + w) < p(v) + p(w) 

for every v, w G V. In particular, p(0) = 0, as one can see by applying (4.13) 
with t = 0. Note that seminorms are sublinear functions, but sublinear functions 
are not required to be nonnegative. Linear functionals on real vector spaces 
are sublinear functions, as are the real parts of linear functionals on complex 
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vector spaces. The sum and maximum of finitely many sublinear functions are 
sublinear functions, which includes the maximum of a sublinear function and 0, 
to get a nonnegative sublinear function. 

Let p be a sublinear function on V, and observe that 

(4.15) =p(0) < p(v) +p(-v) 
for every v G V. If 

(4.16) P{~v)=p(v) 
for every v £ V, then it follows that 

(4.17) p(v) > 

for every v G V, and that p is a seminorm on V when V is a real vector space. 
Similarly, if V is a complex vector space, and 

(4.18) p(tv)=p(v) 

for every i 6 C with \t\ = 1, then p is a seminorm on V. 
If p is a sublinear function on V, then 

(4.19) p(v) — p(w) < p(v — w) 
and 

(4.20) p(w) - p(v) < p(w - v) 
for every v, w G V, and hence 

(4.21) \p( v ) ~ p( w )\ < max(p(t> — w),p(w — v)) 
for every v,w G V. Note that 

(4.22) N(v) =max(p(v),p(-v)) 

is a seminorm on V when V is a real vector space, by the remarks in the previous 
paragraphs. Of course, a complex vector space may also be considered to be 
a real vector space, by forgetting about mulitplication by i. If V = R" or C n 
for some positive integer n, then N(v) is bounded by a constant multiple of the 
standard Euclidean norm of v, as in the previous section. This implies that p(v) 
is continuous with respect to the standard Euclidean metric and topology on V, 
as before. 



4.3 Another extension theorem 

Theorem 4.23 Let V be a real vector space, and let p be a sublinear function 
on V. If W is a linear subspace of V and \i is a linear functional on W such 
that 

(4.24) f-iw) < p(w) for every w G W, 

then there is a linear functional Jl on V which is equal to /i on W and satisfies 

(4.25) fl(v) < p(v) for every v G V. 
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This is the analogue of Theorem 2.26 in Section 2.3 for sublincar functions 
instead of norms. Note that the statement and proof of Theorem 2.26 already 
work in exactly the same way for seminorms instead of norms. The case of 
sublincar functions is essentially the same, except for a few simple changes 
following the differences in the statements of the theorems. 

As in (2.32), we would like to show that there is an a £ R such that 

(4.26) H( x ) + * a — p( x + iz ) f° r ever y x G Wj and t £ R. 
This is equivalent to 

(4.27) Hj{x) + a < p(x + z), fJ,j{x) — a < p(x — z) for every x £Wj, 

because one can convert (4.26) into (4.27) when t ^ using homogeneity, and 
the t = case of (4.26) follows from the induction hypothesis for \ij and Wj. 
Let us rewrite (4.27) as 

(4.28) Hj{x) — p(x — z) < a < p(x + z) — fJ,j(x) for every x £Wj. 

To show that there is an a £ R that satisfies (4.28), it suffices to verify 

(4.29) Hj(x) -p(x-z) <p(y + z) - Hj(y) for every x, y £ Wj, 
which reduces to 

(4.30) Vj(x + y) < P{ x — z) + p{y + z) for every x, y £ Wj. 
The subadditivity property of p(v) implies that this condition holds if 

(4.31) p,j(x + y) < p(x + y) for every x, y £ Wj, 
which is the same as 

(4.32) ^j( w ) < Pi w ) for every w £ Wj. 

This holds by induction hypothesis, which completes the proof of Theorem 4.23. 

4.4 Minkowski functionals 

Let V be a real or complex vector space, and let A be a subset of V such that 
£ A. Suppose also that A has the absorbing property that for each v £ V 
there is a positive real number t such that 

(4.33) tv£A. 

If V = R™ or C™ for some positive integer n, and if is an element of the 
interior of A with respect to the standard Euclidean metric and topology on V, 
then A obviously has this property. 
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Under these conditions, the Minkowski functional on V associated to A is 
defined by 

(4.34) N A (v) = inf{r > : r" 1 v G A} 

for each v eV. Note that N(v) > for every v G V, N(0) = 0, and 

(4.35) N A (tv) = tN(v) 

for every nonnegative real number t. By construction, 

(4.36) N A (v) < 1 

for every v G A. If V = R™ or C™ for some positive integer n, and if A is an 
open set in V with respect to the standard Euclidean metric and topology, then 

(4.37) N A (v) < 1 

for every v G A. This is because r -1 v G A for all r sufficiently close to 1 when 
v G A and A is an open set. 

Let —A be the set of vectors in V of the form — v with v G A, and let £ ^4 
be the set of vectors in V of the form tv with v € A for each real or complex 
number t, as appropriate. Thus 

(4.38) N A {v) = inf{r > : v G r A} 
for each v G V. If -A = A, then 

(4.39) iV A (-«) - 7V A (w) 

for every w G V. Similarly, if V is a complex vector space, and if t A = A for 
every complex number t with \t\ = 1, then 

(4.40) N A (tv) — N A (v) 

for every v G V and t G C with |t| = 1. 

Let us suppose from now on in this section that A is star-like about 0, which 
means that A contains every line segment in V between and any other element 
of A. Equivalently, A is star-like about if 

(4.41) tAcA 

for every nonnegative real number t with t < 1. If v G V satisfies N A (v) < 1, 
then there is a positive real number r < 1 such that v G r A, and it follows that 
v G A. If V" = R™ or C n for some n and A is an open set in V with respect to 
the standard metric and topology, then we get that 

(4.42) A = {v G V : N A (v) < 1}. 
Similarly, if A is a closed set in V — R™ or C™, then 

(4.43) A = {veV :N A {v)<l}. 
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To see this, it remains to check that v 6 A when v 6 V satisfies N A (v) = 1. In 
this case, r _1 v G A for some positive real numbers r that are arbitrarily close 
to 1, which implies that v e A when A is closed. 

Suppose now that A is a convex set in V. Note that this implies that A is 
star-like about 0, because G A. We would like to show that 

(4.44) N A (v + w) < N A (v) + N A (w) 

for every v, w <E V. 

Let v,w G V be given, and let r„, r w be positive real numbers such that 

(4.45) r v > N A (v), r w > N A (w). 

This implies that r" 1 v, r" 1 w are elements of A, because of the definition of 
N A and the fact that A is star-like about 0. If A is convex, then it follows that 

(4.46) (r v + r^- 1 (v + w) = (r^ 1 v) + (r" 1 w) 

is an element of A too, so that 

(4.47) N A (v + w) <r v +r w . 

This implies (4.44), since r vi r w can be arbitrarily close to N A (v), N A (w). 



4.5 Convex cones 

Let V be a vector space over the real numbers. A nonempty set E C V is said 
to be a cone if 

(4.48) tveE 

for every v £ E and nonncgative real number £, or equivalently, 

(4.49) tECE 

for every t > 0. In particular, £ B, since we can take i = 0, and -E ^ 0. Note 
that every linear subspace of V is a cone. 

We shall be especially interested in convex cones, which are cones that are 
also convex sets. Equivalently, a nonempty set E C V is a convex cone if 

(4.50) rv + tweE 

for every v,w € E and nonnegative real numbers r, t. Thus linear subspaces of 
V are also convex cones. If V = R™, then it is easy to see that 

(4.51) {v e R" : vj > for j = 1, . . . , n} 
is a convex cone in V. 
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If E is any nonempty subset of V, then we can get a cone C(E) in V from 
E, by taking 

(4.52) C(E)=\JtE={tv:veE,t>0}. 

t>o 

Of course, C(E) = E when E is already a cone in V. One can check that C{E) 
is also convex in V when E is convex. 

If p is a sublinear function on V, then it is easy to see that 

(4.53) C p = {veV : p(v) < 0} 

is a convex cone in V. This cone has another important property, which is that 
it is a closed set in a suitable sense. To make this precise, it is convenient to 
suppose that V = R" for some positive integer n. As in Section 4.2, sublinear 
functions on R" are continuous with respect to the standard Euclidean metric 
and topology, which implies that C p is a closed set. 

Let V be any real vector space again, and let || • || be a norm on V. If E is 
a nonempty subset of V, then put 

(4.54) Pe(v) — ini{\\v - w\\ : w £ E} 

for each v G V. Thus Pe(v) = for every v £ E, and conversely Pe(v) = 
implies that v G E when E is a closed set in a suitable sense. In particular, 
this works when V = R™ and E is a closed set with respect to the standard 
Euclidean metric and topology, because of the remarks at the end of Section 
2.1, about the relationship between an arbitrary norm on R" and the standard 
Euclidean norm. Of course, if V = R™, then one can simply take || • || to be the 
standard Euclidean norm on R". 

If E is a convex cone in V, then one can check that pe is a sublinear function 
on V. It follows that every closed convex cone in R" can be expressed as in 
(4.53) for some sublinear function p on R". 

4.6 Dual cones 

Let V be a real vector space again, and let E be a nonempty subset of V. 
Consider the set E' C V* consisting of all linear functional A on V such that 

(4.55) X(v) > 

for every v E E. It is easy to see that this is a convex cone in V* , known as the 
dual cone associated to E. One can also check that 

(4.56) C(E)' = E', 

so that we may as well restrict our attention to convex cones E in V. 

Suppose for the moment that V = R" for some positive integer n, and 
let us identify V* with R™ as in Section 2.2. Observe that E' is the same as 
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the dual cone associated to the closure E of E with respect to the standard 
Euclidean metric and topology. Thus we may as well restrict our attention to 
closed subsets of R", and hence to closed convex cones. Similarly, the dual cone 
E' of any set E C R" automatically corresponds to a closed subset of R™, and 
therefore to a closed convex cone. 

If V is any real vector space and E is a nonempty subset of V, then let E" 
be the dual cone associated to E 1 C V*, which is a convex cone in the second 
dual V** of V. As in Section 2.3, V** can be identified with V in a natural way. 
Using this identification, one can check that 

(4.57) E C E". 

Suppose for convenience that V = R" for some n again. If E is a closed 
convex cone in V, then 

(4.58) E = E". 

To see this, it suffices to show that E" C E, since we already know (4.57). If v 
is any element of R" that is not in E, then we would like to show that v is also 
not in E". Equivalently, we would like to show that there is a linear functional 
A on V such that A > on E and \{v) < 0. 

It is easy to find such a linear functional A initially on the linear span W of 
v in V, and we would like to extend A to all of V using Theorem 4.23 in Section 
4.3. More precisely, let p be a sublinear function on V such that E is the set 
where p < 0, which exists when E is a closed convex cone in V = R", as in 
the previous section. It is convenient to ask also that p > everywhere on V, 
so that E is the set where p = 0. This holds by construction when p = pe as 
before, and otherwise one can simply replace p with max(p, 0). 

Let fi be the linear functional defined on the linear span W of V by 

(4.59) fi(tv)=tp(v) 
for each teR. Thus 

(4.60) n{tv) =p(tv) 
when t > 0, and 

(4.61) fi{t v) = tp(v) <0<p(t v) 

when t < 0, so that fi < p on W . Theorem 4.23 implies that there is an extension 
/I of fi to a linear functional on V such that £t < p on all of V. If A = — fl, then 
X(v) = —p(v) < and A > — p on all of V, which implies that A > on E, as 
desired. 

As an example, consider the case where E is the set of v £ R" such that 
vj > for j = 1, . . . ,n, as in (4.51). In this case, one can check that i?' = E, 
using the standard identification of R™ with its own dual space. In particular, 
E" = E. 
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4.7 Nonnegative self-adjoint operators 

Let (V, (•,•)) be a real or complex inner product space. A self-adjoint linear 
transformation A on V is said to be nonnegative if 

(4.62) (A(v),v)>0 

for every v £ V. The identity transformation obviously has this property for 
instance. If W is a linear subspace of V, then we have seen that the orthogonal 
projection P\y of V onto W is self-adjoint, as in Section 3.3. In this case, we 
also have that 

(4.63) (Pw(v),v) = (P w (v),P w (v)) = \\Pw(v)\\ 2 

for every v £ V, as in (3.20), and hence that P\y is nonnegative. 
If a £ V, then it is easy to see that 

(4.64) A a {v) = (v,a)a 

defines a nonnegative self-adjoint linear operator on V, which is the same as the 
orthogonal projection onto the span of a when ||a|| = 1. Of course, every self- 
adjoint linear operator A on V is a linear combination of rank-one operators like 
this, as a consequence of diagonalization. If the eigenvalues of A are nonnegative, 
then it follows that A is nonnegative, because A can be expressed as a linear 
combination of rank-one operators like this with nonnegative coefficients, by 
diagonalization. Conversely, one can check that the eigenvalues of a nonnegative 
self-adjoint linear operator A are nonnegative, by applying the nonnegativity 
condition to the eigenvectors of A. 

If T is any linear operator on V, then T* T is self-adjoint and nonnegative, 
because 

(4.65) (T* T)* = T* (T*)* =T*T 
and 

(4.66) ((T* T)(v),v) = (T(v),T(v)) = \\T(v)\\ 2 > 

for each v £ V. In particular, if B is a self-adjoint linear operator on V, then 
B* is a nonnegative self-adjoint linear operator on V. Conversely, one can use 
diagonalizations to show that every nonnegative self-adjoint linear operator A 
on V can be expressed as B 2 for some nonnegative self-adjoint linear operator 
B on V. Note that — B 2 is a nonnegative self-adjoint linear operator on V when 
B is anti-self-adjoint. 

If A is a nonnegative self-adjoint linear transformation on V, then the trace of 
A is equal to the sum of the eigenvalues of A, with their appropriate multiplicity, 
because of diagonalization. More precisely, 

(4.67) trA=\\A\\ Sl = \\A\\ tr , 

as in Sections 3.8 and 3.9. Similarly, let us check that 



(4.68) 



tr AB > 
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for every pair of nonnegative self-adjoint linear operators A, B onV. If A = A a 
is as in (4.64), then 

(4.69) tvAB = (B(a),a) > 0. 

Otherwise, A can be expressed as a sum of rank 1 operators of this type, as 
before, and (4.68) follows. 

Conversely, if B is a self-adjoint linear operator on V such that (4.68) holds 
for every nonnegative self-adjoint linear operator A on V, then B is also non- 
negative. This follows by applying this condition to A = A a as in (4.64), as in 
the previous paragraph. 

Let £(V) be the vector space of all linear operators on V, and let C sa {V) 
be the collection of self-adjoint linear operators on V . Thus C sa (V) is a linear 
subspacc of C(V) when V is a real vector space. If V is a complex vector space, 
then C(V) is a complex vector space, but £ sa (V) is a real vector space, which 
may be considered as a real-linear subspace of C(V). 

As in Section 3.7, 

(4.70) (A,B) c{v) =tvAB* 

defines an inner product on C(V), for which the corresponding norm is the 
Hilbert-Schmidt norm. This reduces to 

(4.71) (A,B) Csa{v) =trAB 

when A and B are self-adjoint. Using this inner product, we can identify C sa (V) 
with its own dual space in the usual way. 

The set of nonnegative self-adjoint linear operators on V forms a convex 
cone in £ sa (V^), and it is also a closed set in JZ sa (V) in a suitable sense. This 
cone is equal to its own dual cone, when we identify the dual of C sa (V) with 
itself using the inner product in the previous paragraph. This follows from the 
earlier remarks about traces of products of self-adjoint linear operators on V. 



Chapter 5 

Sums and £ p spaces 



5.1 Nonnegative real numbers 

Let E be a nonempty set, and let / be a nonnegative real-valued function on E. 
If A is a nonempty subset of E with only finitely many elements, then the sum 

(5.i) E /(*) 

xeA 

can be defined in the usual way. The sum 
(5-2) £ /(*) 

is then defined as the supremum of the finite subsums (5.1), over all finite 
nonempty subsets A of E. More precisely, the sum (5.2) is considered to be +00 
as an extended real number when there is no finite upper bound for the finite 
subsums (5.1). 

Of course, if E is itself a finite set, then this definition of the sum (5.2) 
reduces to the usual one. If E is the set Z+ of positive integers, then the usual 
definition of the sum of an infinite series is equivalent to 

00 n 

(5-3) £/(j) = sup ]T. f(j), 

which is again interpreted as being +00 when there is no finite upper bound for 
the partial sums. In this case, this definition of the infinite sum is equivalent 
to the previous one. More precisely, this definition of the sum is less than or 
equal to the previous one, because the partial sums Y^j=i /(.?) are subsums of 
the form (5.1) for each n. Similarly, the previous definition of the sum is less 
than or equal to this one, because every finite set A C E = Z + is contained in 
a set of the form {1, . . . , n} for some n, so that the corresponding subsum (5.1) 
is less than or equal to the partial sum Y^j=i fU)- 
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Suppose that the finite subsums (5.1) are bounded by a nonnegative real 
number C, so that the sum (5.2) is also less than or equal to C. In this case, it 
is easy to see that 

(5.4) ' E(f,e) = {x&E:f(x)>e} 

has at most C/e elements for each e > 0. Otherwise, if f(x) > e for more than 
C/e elements x of E, then there would be a finite set A C E such that (5.1) is 
larger than C. In particular, E(f, e) has only finitely many elements for each 
e > 0. This implies that 

oo 

(5.5) {x G E : f(x) > 0} = (J E(f, l/n) 

n=l 

has only finitely or countably many elements, so that (5.2) can be reduced to a 
finite sum or an ordinary infinite series. 

If /, g are nonnegative real- valued functions on E, then one can check that 

(5.6) +g(x)) = J2 m + E 9{x), 

xeE xeE xeE 

by reducing to the case of finite sums. This includes the case where some of the 
sums are infinite, with the usual conventions for sums of extended real numbers. 
Similarly, if a is a nonnegative real number, then 

(5.7) ^o/(i)=o^/(i), 

xeE xeE 

with the convention that a ■ (+oo) = +oo when a > 0. In this context, it is 
also appropriate to make the convention that • (+oo) = 0, since the left side 
of (5.7) is automatically equal to when a = 0. 



5.2 Summable functions 

A real or complex- valued function / on a nonempty set E is said to be summable 
on E if 

(5-8) J2\f(x)\ 

xeE 

is finite. If / and g are summable functions on E, then it is easy to see that 
/ + g is summable, since 

(5-9) \f(x)+g(x)\ < \f(x)\ + \g(x)\ 

for each x € E. Similarly, if / is a summable function on E and a is a real 
or complex number, as appropriate, then a f is a summable function on E too. 
Thus the real or complex- valued summable functions on E form a vector space 
over the real or complex numbers, as appropriate. 
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Let / be a real or complex-valued summable function on E, and let e > 
be given. By definition of (5.8), there is a finite set A e C E such that 

(5-10) £ |/0r)| < E |/(a0|+6. 

x£E x£A € 

Equivalently, 

(5.H) £i/(*)i< E + e 

x£A x£A e 

for every finite set ACE. If B is a finite subset of E which is disjoint from A e , 
then we get that 

(5-12) £|/(z)|<e, 

xeB 

by applying the previous inequality to A = A £ UB. This will be used frequently 
to approximate various sums by finite sums. 

We would like to define the sum J2 x eE /( x ) °f a rea ^ or complex-valued 
summable function / on E. One way to do this is to express / as a linear 
combination of nonnegative real- valued summable functions on E, and then use 
the previous definition of the sum for nonnegative real- valued functions on E. 
Another way to do this is to use the fact that the set of x £ E such that f(x) ^ 
has only finitely or countably many elements, as in the previous section, and then 
treat the sum as a finite sum or an infinite series, by enumerating the elements 
of this set. The summability of / on E implies that such an infinite series would 
be absolutely convergent, and hence convergent. The value of the sum would 
also not depend on the choice of the enumeration of the set where / ^ 0, because 
absolutely-convergent infinite series are invariant under rearrangements. 

Whichever way one uses to define the sum, a key point is that it should be 
approximable by finite sums in a natural way. More precisely, for each e > 0, 
there should be a finite set A € C E such that 



(5.13) 



£/(*)-£/(*) 

xeE xeA 



< e 



for every finite set A C E such that A e C A. One can check that both of 
the approaches to defining the sum described in the previous paragraph have 
this approximation property. Practically any reasonable way of defining the 
sum as a limit of finite subsums should also have this property, because of the 
approximation property (5.12) mentioned earlier. 

At any rate, it is easy to see that any definition of the sum that satisfies (5.13) 
is uniquely determined by this property. In particular, this can be helpful for 
showing that such a definition does not depend on any auxiliary choices used 
to define the sum. In effect, (5.13) characterizes the sum as a somewhat fancy 
limit of the finite subsums J2 x eA f( x )- This can be made precise by treating 
these subsums as a net, which is indexed by the collection of all finite subsets 
A of E. This also uses the natural partial ordering on the collection of all finite 
subsets of E by inclusion. 
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If / is any real or complex-valued summable function on E, then 



(5.14) 



xeE 



£/(*) <£l/(*)l- 



xeE 



This follows easily by approximating the sum by finite subsums, and using the 
triangle inequality. If / and g are summable functions on E, and a and b are 
real or complex numbers, as appropriate, then one can also check that 



(5.15) 



+ bg(x)) = a /(*) + & E ffW- 



xeE 



xeE 



xeE 



As usual, this can be shown by approximating the sums by finite subsums. 
Using these two properties of the sum, we get that 



(5.16) 



xeE 



xeE 



£/(*)-£</(*) = £(/(*) <£ 



xeE 



for any pair of summable functions f,gonE. 

If / is any real or complex-valued function on E, then the support of / is 
the set supp / of x € E such that /(x) 7^ 0. If the support of / has only finitely 
many elements, then the sum ^2 x£E f(x) reduces to an ordinary finite sum. 
The sum X/xes fi x ) °f a summable function / on E can be characterized as 
the unique linear functional on the vector space of all summable functions on 
E that satisfies (5.14) and reduces to the ordinary finite sum when / has finite 
support. This follows by approximating an arbitrary summable function on E 
by functions with finite support using (5.12), and then using (5.16) to analyze 
the corresponding sums. 



5.3 Convergence theorems 

Let {fj}°^ 1 be a sequence of real or complex- valued summable functions on a 
nonempty set E, and suppose that 

(5.17) Umf j {x) = f(x) 
for every xeE. Of course, 

(5.18) lim £/,•(*) = £/(*) 

xeE xeE 

when E is finite, but the situation is more complicated when E is infinite. For 
example, if E is the set of positive integers, fj(j) = 1 for each j, and fj(x) = 
when x 7^ j, then 

(5.19) E^'( a; ) = 1 

xeE 
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for each j, but {fj(x)}j^ 1 converges to for each x £ E. Alternatively, if we 
take fj(x) = 1/j when x < j and fj(x) = when x > j, then we get the same 
conclusions, but with {fj}°Z 1 converging to uniformly on E. 

However, there are some positive results, as follows. Suppose that {fj(x)}°^ 1 
is a monotone increasing sequence of nonncgative real valued functions on E, 
so that 

(5.20) /, (x) < / j+1 (x) 
for every x £ E and j > 1. Put 

(5.21) f(x) = sup fj(x) 

for each x £ E, which may be +oo. Thus 

(5.22) fj(x) -> sup/i(x) as j ->■ oo 

!>1 

for every x £ E, with the usual interpretation when /(x) = +oo. Under these 
conditions, the monotone convergence theorem implies that 

(5.23) asj^oo, 
xeE xeE 

with suitable interpretations when any of the quantities involved arc infinite. 
Of course, 

(5-24) £/i(s)<£/:,- + i (*)<£/(*) 

xeE xeE xeE 

for each j, by monotonicity. If ^2 xeE fi{x) = +oo for some I, then 

(5.25) £ &0»0 = £ /(*) = +oo 

xeE xeE 

for each j > Z, and (5.23) is trivial. Similarly, if /(xo) = +oo for some x £ -E, 
then X^es /( x ) i s interpreted as being equal to +oo, and 

(5-26) >/jW 

also tends to +oo as j — > oo. Suppose then that each fj is summable, and that 
f(x) is finite for every x £ E. In this case, one can get (5.23) using the fact 
that 

(5.27) y, = lim E fsW ^ lim E 

xeA xeA xeE 

for every hnitc set A C E, since J2 x eE f( x ) 1S ec l ua l to the supremum of 
J2xeA f( x ) over au nm te subsets A of E. 

Now let be any sequence of nonnegative real- valued functions on E, 

and put 

(5.28) /(x) = liminf fj(x). 
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Under these conditions, Fatou's lemma states that 

(5.29) ^/(aO<liminf^ /,(*), 

again with suitable interpretations when any of the quantities are infinite. If A 
is a finite subset of E, then 

(5.30) ]T/(*)<liminf]r/,(x) 

is a well known property of the lower limit. This implies that 

(5.31) V/(z)<liminfV /,-(*), 

z — ' J— >oc ' 

because the sum of /j(x) over x € A is less than or equal to the sum of fj(x) 
over x £ when A C E. In order to get (5.29), it suffices to take the supremum 
of the left side of (5.31) over all finite subsets A of E. 

Suppose now that {fj}j^Li is a sequence of real or complex valued functions 
on E that converges pointwise to another function / on E. Suppose also that 
h is a nonncgative real-valued function on E which is summable, and that 

(5.32) |/,(x)| < h(x) 
for every x £ E and j > 1. This implies that 

(5.33) |/(x)| < /i(x) 

for every x € E, because {fj(x)}j^Li converges to f(x) for every x E E by 
hypothesis. Note that fj is a summable function on E for each j, and that / is 
a summable function on E too, since h is summable. 

Under these conditions, the dominated convergence theorem implies that 

(5.34) lim ^ 

l£E x(EE 

To prove this, it suffices to show that 

(5.35) limJ2\f j (x)-f(x)\=0. 

xeE 

Let e > be given, and let us show that there is a L > 1 such that 

(5.36) ^|/(x)-/,(x)| < e 

xGE 

for every j > L. 

Because h is summable on E, there is a finite set E e C E such that 

(5.37) IM*)l<f. 

xeE\E e 
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as in (5.12). Thus 

(5.38) J2 I/O*) -&(*)!< E 2 IM*)I<| 

xeE\E e xeE\E e 

for each j. We also have that 

(5.39) lim V |/(as) - ^(a:)! = 0, 

J— ¥00 ^— ' 

xGE e 

because E e is finite and {fj(x)}j? =1 converges to f(x) for each x G E. This 
implies that there is an L > 1 such that 

(5.40) £ I/O*)- /,•(*)!<§ 

when j > L. Combining (5.38) and (5.40), we get that (5.36) holds when j > L, 
as desired. 



5.4 Double sums 

Let Ei and E 2 be nonempty sets, and let E — E\ x E 2 be their Cartesian 
product. If f(x,y) is a nonnegative real- valued function on E, then we would 
like to check that 

(5.41) ]T f(x,y) = E ( E /(*•»)) = E ( E /(^W))- 

(x,y)eE yS-E 2 96^2 

More precisely, it may be that J2 yeE f(x', y) = +00 for some x' G E\, or 
that X^xe-Ei f( x 'V') = +°° f° r some 2/ G -^2, in which case the corresponding 
iterated sum in (5.41) is considered to be +00 as well. 

Let A be a finite subset of E, and A\, A 2 be finite subsets of E\, E 2 , 
respectively, such that 

(5.42) A C At x A 2 . 
Clearly 

(5.43) £ < E ( E /O^)) = E ( E /(^f))- 
This implies that ^ x y - )eA f(x, y) is less than or equal to each of 

(5.44) Yl ( E f&v)) and E ( E /(*.»))• 

xeEi y£E 2 yGE 2 xeEi 

It follows that J2( x y )£E V) 1S ^ ess th an or equal to the same iterated sums, 
because A is an arbitrary finite subset of E. 
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In the other direction, if A\ and A 2 are arbitary finite subsets of E\ and E 2 , 
respectively, then 

(5.45) y ( E f( x >vi) = E ( E /(*»»)) = E 

xeA x yeA 2 yeA 2 xeA x (x,y)eA 1 xA 2 

and therefore 

( 5 - 46 ) E ( E /(*>»)) - E ( E /(^.y)) ^ E /o^)- 

ieii j/eA 2 yeA 2 ieii (x,y)eE 

Using this, it is easy to see that 

(5.47) y (E /m) ^ E f( x >v) 

xeA t yEE 2 (x,y)eE 

and 

(5-48) Y /(*,»)) < ]T 

2/eA 2 iGfii (x,y)eE 

This implies that the iterated sums are less than or equal to the sum of f(x, y) 
over £7, as desired, by taking the suprema over A\ and A 2 . 

Now let f(x,y) be a real or complex-valued function on E = E\ x i? 2 - If 
any of the sums 

(5.49) y i/fry)!' E ( E E ( E i/(*»y)i) 

(x,y)eE xeE t y£E 2 yeE 2 xeE t 

are finite, then they are all finite, and equal to each other, by the previous 
discussion. Suppose that this is the case, so that f(x, y) is a summable function 
on E. We also get that 

(5.50) Y \f( x 'V)\ < + 00 

yeD 2 

for every x E D\, and 

(5.51) Y l/(^»)l<+~ 

for every y E D 2 , so that /(x, y) is a summable function on D 2 for each x € Di, 
and f(x,y) is a summable function on Z?i for each y £ D 2 . Put 

(5-52) /!(*)= ^ f(x,y) 

yeE 2 

for each x E E\, and 

(5.53) / 2 (y)= £ 

for each y E E 2 . Thus 

(5-54) 1/^)1 < Y 
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for every x € E\ , and 

(5-55) ' \.h(y)\< ]T \f(x,y)\ 

for every i/ 6 £2. This implies that fi(x) is a summable function on E\, and 
that f2(y) is a summable function on E2, because of the finiteness of the iterated 
sums in (5.49). Under these conditions, we also have that 

(5.56) ]T f(x, y )= ]r /iW= E My)- 

(x,y)eE x£E 1 y£E 2 

One way to show this is to express f{x, y) as a linear combination of nonnegative 
summable functions on E, and apply (5.41) in that case. Alternatively, one can 
approximate f{x,y) by functions with finite support on E, as in (5.12). The 
equality of the iterated and double sums is clear for functions with finite support, 
and one can use the previous results for nonnegative functions to estimate the 
errors in the approximations. 



5.5 l v Spaces 

Let E be a nonempty set, and let p be a positive real number. A real or complex- 
valued function / on E is said to be p-summable if \f(x) \ p is a summable function 
on E. The spaces of real and complex-valued p-summable functions on D are 
denoted £ p (E,Il) and £ P (E,C), respectively, although we may also use £ P (E) 
to include both cases at the same time. If / and g are p-summable functions on 
E, then it is easy to see that / + g is also p-summable, using the observation 
that 

(5.57) \f(x)+g(x)\ < \f(x)\ + \g(x)\ < 2 max(|/(x)|, | 5 (x)|) 
for every 16E, and hence 

(5.58) \f(x) + g(x)\ p < 2 p mnx(\f{x)\ p , \g{x)\ p ) < 2 P (\f(x)\ p + \g{x)\ p ). 

Similarly, a f(x) is p-summable on E when f(x) is p-summable on E and a is 
a real or complex number, as appropriate, so that £ P (E,R) and £ P (E,C) are 
vector spaces with respect to pointwise addition and scalar multiplication of 
functions on E. 

If / is a real or complex- valued p-summable function on E, then we put 



(5-59) ll/ll P =(El/( a; )l P ) 1/ ' 

x£E 



Thus 

(5.60) lk/|| P = |a|||/|| p 

for every real or complex number a, as appropriate. In this context, Minkowski's 
inequality states that 

(5.61) ||/i+/ 2 || P < ||/i|| P +||/ 2 || P 
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when p > 1 and /i, /2 are p-summable functions on E. This follows from 
the version (1.73) of Minkowski's inequality for finite sums, although analogous 
arguments could be used more directly in this case. If < p < 1, then 

(5-62) ||/i + / 2 ||£<||/i||£ + ||/ 2 ||£, 

as in (1.87). 

Let £°°(E, R), £°°(E, C) be the spaces of bounded real and complex-valued 
functions / on E, respectively, which is to say that the values of / on E are 
contained in a bounded subset of R or C. Of course, the sum and product of 
two bounded functions is bounded. If / is a bounded function on E, then put 

(5.63) N/lloo = sup{|/(as)| : as e ,E7}. 
Clearly 

(5.64) ||«/||oc = MI|/||oo 

for every real or complex number a, as appropriate. One can also check that 

(5.65) H/1+/2II00 < H/1II00 + II/2II00 
and 

(5.66) H/1/2II00 < H/1II00II/2II00 

for all bounded real or complex- valued functions /1, / 2 on E. 

If / and g are real or complex-valued 2-summable functions on E, then their 
product f g is a summable function on E, because 

(5.67) \f(x)\ \g(x)\ < max(|/(x)| 2 , |«,(s)| 2 ) < \f(x)\ 2 + \g(x)f 
for every x G E. Alternatively, one can use the well-known fact that 

(5.68) 2ab<a 2 + b 2 

for all nonnegative real numbers a and b to get a better estimate. Put 
(5-69) = £ /(*)$(*) 

x€E 

in the case of real- valued functions on E, and 
(5-70) </,<?) = £/(*) 

x€E 

in the complex case. It is easy to see that (5.69) defines an inner product 
on £ 2 (E, R), that (5.70) defines an inner product on £ 2 (E, C), and that the 
corresponding norms are the same as the I 2 norm ||/|| 2 defined earlier. 
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5.6 Additional properties 

If / is a p-summable function on a nonempty set E for some p > 0, then it is 
easy to see that / is bounded on E, and that 



(5.71) 



as in (1.80). Similarly, if p, q are positive real numbers with p < q, and / is a 
p-summable function on E, then / is q-summable, and 

(5-72) ||/|| 9 < ||/|| p , 

as in (1.81). 

A real or complex-valued function / on E is said to "vanish at infinity" if 
for each e > there is a finite set A t C E such that 

(5.73) 1/(^)1 < e 

for every x 6 E\A e . Of course, any function on E with finite support satisfies 
this condition. Let c (E,R), cq{E,G) denote the spaces of real and complex- 
valued functions on E, respectively, with this property. As before, we may also 
use the notation cq{E) to include both cases at the same time. Note that these 
are linear subspaces of the corresponding spaces of bounded functions on E. 

If / is p-summable on E for some p > 0, then / vanishes at infinity on E. 
Equivalently, if there is an e > such that \f(x)\ > e for infinitely many x £ E, 
then / is not p-summable for any p > 0. Note that a bounded function / on E 
vanishes at infinity if and only if for each e > there is a function / e with finite 
support on E such that 

(5.74) ||/-/ e ||oo<e. 

If / is a p-summable function on E, then one can check that for each e > 
there is a function f e with finite support on E such that 

(5.75) ll/-/e|| P <e- 

In particular, this implies (5.74) in this case, because of (5.71). 

Suppose now that {/j}j*Li is a sequence of real or complex- valued functions 
on E that converges pointwise to a function / on E. If there are positive real 
numbers p, C such that fj is p-summable for each j, with 

(5-76) ||/ill P <C 

for each j, then / is p-summable too, and 

(5-77) < C. 

This can be derived from Fatou's lemma, as in Section 5.3. 
Similarly, if fj is a bounded function on E for each j, with 

(5-78) ll/illoo < C 
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for each j, then / is bounded on E as well, and 

(5.79) n/iu < a 

This is easy to verify, directly from the definitions. However, it is easy to give 
examples where fj vanishes at infinity on E for each j, but / does not vanish at 
infinity on E. If fj vanishes at infinity on E for each j, and {fj}JLi converges to 
/ uniformly on E, then / vanishes at infinity on E, by standard arguments. Of 
course, {fj}j^\ converges to / uniformly on E if and only if {fj}°^ 1 converges 
to / with respect to the £°° norm, in the sense that 

(5.80) lim - /Hoo = 0. 

Let {fj}°^ 1 be a sequence of functions on E in £ P (E, R) or £ P (E, C) for some 
Pi < p < oo. As usual, we say that {fj}fLi is a Cauchy sequence in £ P (E, R) 
or £ P (E, C), as appropriate, if for each e > there is an L(e) > 1 such that 

(5-81) \\fj-fi\\p<* 

for every j, I > L(e). This is equivalent to saying that {fj}j^i is a Cauchy 
sequence with respect to the metric 

(5.82) d p (g,h) = \\g-h\\ p 

on £ P (E, R) or £ P (E, C) when p > 1, and with respect to the metric 

(5-83) d p {g,h) = \\g-h\\l 

when < p < 1. We would like to show that {fj} c *^ 1 converges to some function 
/ on E in £ P (E, R) or £ P (E, C), as appropriate, in the sense that 

(5.84) lim \\fj - f\\ p = 0, 

and thereby conclude that £ P (E, R) and £ P (E, C) are complete as metric spaces. 

To do this, observe first that {fj(x)}j^ 1 is a Cauchy sequence of real or 
complex numbers, as appropriate, for every x € E, because 

(5-85) \fi(x)-fi(x)\ < ||/, -/ ; ||p 

for every x G E, j,l > 1, and < p < oo. Thus {fj(x)}°^ 1 converges to a real 
or complex number f(x), as appropriate, for each x <G E, since every Cauchy 
sequence of real and complex numbers converges. Note that 

(5-86) ll/ill P <ll/L(i)|| P + l 

for every j > L(l) when p > 1, and that 

(5-87) ll/illS<ll/L(i)||£ + l 
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for every j > L(l) when < p < 1, by applying the Cauchy condition (5.81) 
with e = 1 and I = L(l), and using the corresponding form of the triangle 
inequality. This implies that / € £ P (E,TL) or £ P (E,C), as appropriate, by the 
earlier remarks about pointwise convergent sequences of functions on E. We 
also get that 

(5-88) \\fj-f\\p<* 

for every j > L(e), by applying the earlier remarks to fj — fi as a sequence in / 
for each j and using the Cauchy condition (5.81) again, so that (5.84) holds, as 
desired. 



5.7 Bounded linear functionals 

Let p, q be real numbers such that 1 < p, q < oo and 
(5.89) 1 + 1 = 1, 

v q 

so that p, q are conjugate exponents. If /, g are real or complex- valued functions 
on a nonempty set E which are p, g-summable, respectively, then their product 
f g is a summable function on E, and 

(5-90) J2\f(x)\\g(x)\ < \\f\\ p \\g\\ q . 

This is Holder's inequality in the present context, which follows easily from the 
version (1.66) for finite sums. We can also allow p = 1, q = oo or p = oo, q = 1, 
using bounded functions on E when the corresponding exponent is infinite. 

A bounded linear functional on £ p (E,Il) or £ P (E,C) is a linear mapping A 
from this space into the real or complex numbers, as appropriate, for which 
there is a nonnegative real number C such that 

(5-91) |A(/)|<C||/|| P 

for every / e £ p (E,Il) or £ P (E, C). This definition makes sense for every p in 
the range < p < oo, but let us focus first on the case where p>l, and consider 
p < 1 afterwards. 

If q is the conjugate exponent associated to p > 1, and g is a real or complex- 
valued function on E in £i(E, R) or £ q (E, C), then 

(5-92) A ff (/) = £/(*) 

defines a bounded linear functional on £ P (E, R) of £ P (E, C), as appropriate, that 
satisfies (5.91) with C = \\g\\ q , by Holder's inequality. One can also check that 
\\g\\ q is the smallest value of C for which (5.91) holds, which is analogous to the 
case of finite sums discussed in Section 2.2. 
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Conversely, let A be a bounded linear functional on £ P (E, R) or £ P (E,C), 
1 < p < oo, that satisfies (5.91). If z € E, then let S z be the function on E 
defined by S z (y) — 1 when y — z, and S z (y) = otherwise. Put 

(5.93) g(z) = X(S Z ) 

for every z e E, so that 

(5-94) X(f) = J2f(x)g(x) 

when / has finite support on E. If A is a finite subset of E, then one can show 
that 

(5.95) (J2\9(x)\ q ) 1/9 <C 

xeA 

when q < oo, and 

(5.96) max| 5 (x)|<C 

when q = oo, using suitable choices of functions / supported on A. This is also 
very similar to the discussion in Section 2.2. 

It follows that g is g-summable when q < oo, and that g is bounded when 
q = oo, with 

(5-97) < C 

in both cases. Thus we can define X g as a bounded linear functional on £ P (E, R) 
or £ P (E, C), as appropriate, as in (5.92), and 

(5-98) A(/) = A fl (/) 

when / has finite support on E, as in (5.94). To show that this holds for every 
p-summable function / on E, one can approximate / by functions with finite 
support on E, as in (5.75) in the previous section. This also uses the fact 
that both A and A g are bounded linear functionals on £ P (E, R) or £ P (E, C), as 
appropriate. 

If p = oo, then it is better to consider bounded linear functionals on cq(E, R), 
cq(E, C) instead of £°°(E, R), £°°(E, C). As before, a bounded linear functional 
on Co(D, R) or Co(D, C) is a linear mapping from this space to the real or 
complex numbers, as appropriate, for which there is a nonnegative real number 
C such that 

(5.99) |A(/)| < CH/Hoo 

for every function / on E that vanishes at infinity. More precisely, this is a 
bounded linear functional on co(E, R) or cq(E, C) with respect to the £°° norm, 
which is the natural norm in this case. 

If g is a summable function on D, then we have seen that (5.92) defines 
a bounded linear functional \ g on £°°(E, R) or £°°(E,C), as appropriate, and 
that A g satisfies (5.91) with p = oo and C = \\g\\i- Hence the restriction of 
A 9 to Co(E, R) or Co(E, C), as appropriate, is a bounded linear functional that 
satisfies (5.99) with C — \\g\\i- One can check that ||g||i is still the smallest value 
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of C for which (5.99) holds, even when we restrict our attention to functions / 
that vanish at infinity on E, instead of considering all bounded functions on E. 

Conversely, suppose that A is a bounded linear functional on co(D,H) or 
co(-D,C) that satisfies (5.99). Let g be the function on E defined by (5.93), as 
before, so that (5.94) holds when / has finite support on E. If A is a finite 
subset of E, then one can show that 

(5.100) J2 1^)1 ^ C ' 

xeA 

using suitable choices of functions / supported on A. This implies that g is a 
summable function on D, with 

(5.101) \\g\U < C. 

To show that A(/) = X g (f) for every function / that vanishes at infinity on E, 
one can approximate such a function / by functions with finite support on E 
with respect to the £°° norm, as in (5.74) in the previous section. 

If g is a bounded real or complex-valued function on E, then we have seen 
that (5.92) defines a bounded linear functional X g on £ 1 (E,IV) or ^(E, C), as 
appropriate, and that X g satisfies (5.91) withp = 1 and C = \\g\\oo- If < p < 1, 
then we have also seen that every p-summable function / on E is summable and 
satisfies 

(5-102) ll/Hi < ||/|| P , 

as in (5.72) in the previous section, with q = 1. It follows that the restriction of 
X g to £ P (E, R) or £ P (E,C), as appropriate, is also a bounded linear functional 
that satisfies (5.91) with C = ||g||oo- It is easy to see that \\g\\oo is still the 
smallest value of C for which (5.91) holds, because 

(5.103) IIMp=i 

for every z g E. 

Conversely, suppose that A is a bounded linear functional on £ P (E, R) or 
£ P (E,C) that satisfies (5.91), where < p < 1. As usual, we can define a 
function g on E by (5.93), so that (5.94) holds when / has finite support on E. 
It is easy to see that g is a bounded function on E, with 

(5.104) Nfflloo < 1, 

by applying these conditions to / = S z for each z G E. One can then check that 
X{f) = X g (f) for every p-summable function / on E, by approximating / by 
functions with finite support on E, as in (5.75) in the previous section. 

5.8 Another convergence theorem 

Let E be a nonempty set, and let 1 < p < oo be given. Also let {fjjj^x be 
a sequence of real or complex-valued functions on E that converges pointwise 
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to another function / on E. Suppose that fj is p-summable for each j when 
p < oo, and that fj is bounded for each j when p — oo, with 



(5.105) 



\\fj\\ P <C 



for some nonnegative real number C and for every j in both cases. As in Section 
5.6, this implies that / is p-summable when p < oo, and that / is bounded when 
p = oo, with H/llp < C in both cases. 

Let 1 < q < oo be the exponent conjugate to p, so that 1/p+l/q = 1, and let 
g be a real or complex- valued function on E which is g-summable when q < oo, 
and which vanishes at infinity on E when q = oo. In particular, g is bounded 
when q = oo. As in the previous section, fj g is summable on E for each j, as 
is f g. Under these conditions, we would like to show that 



(5.106) 



lim E fj( x )9{x) = E f( x )9( x )- 



j->oo 



The proof is similar to that of the dominated convergence theorem in Section 
5.3, and in fact one can derive this from the dominated convergence theorem 
when p = oo. 

Equivalcntly, we would like to show that 



(5.107) 



lim 1 £{f j {x)-f(x))g(x)=0. 

1 — vno ' * 



x£E 



Let e > be given, and let A be a finite subset of E such that 
(5.108) 



< e 



xGE\A 



when q < oo, and \g(x)\ < e for every x G E\A when q = oo. Using this, it is 
easy to see that 



(5.109) 



E (£(*)-/(*)) Sto 

i6E\i 



<e||/ j -/|| p <2Ce, 



for each j, by Holder's inequality. We also have that 



(5.110) 



< e 



for all sufficiently large j, because {/j}"^ converges to / pointwise on E, and 
because A has only finitely many elements. The desired conclusion (5.107) 
follows by combining these two statements. 

Note that (5.107) follows directly from Holder's inequality if we ask that 



(5.111) 



lim 11/^.-/^ = 0. 



In this case, it would also have been sufficient to ask that g be bounded on E 
when q = oo. 



Chapter 6 

Banach and Hilbert spaces 



6.1 Basic concepts 

Let V be a vector space over the real or complex numbers, and let \\v\\ be a 
norm on V. In this chapter, V is allowed to be infinite-dimensional, but the 
definition of a norm on V is the same as in the finite-dimensional case. 

As usual, a sequence {vj}^L 1 of elements of V is said to converge to v 6 V 
if for each e > there is an L > 1 such that 

(6.1) \\vj — v\\ < e 
for every j > L. In this case, we put 

(6.2) lim vj — v, 

and call v the limit of the sequence {vj}'^ 1 . It is easy to see that the limit of 
a convergent sequence is unique when it exists. 

Similarly, a sequence {vj}°Z 1 in V is a Cauchy sequence if for each e > 
there is an L > 1 such that 

(6.3) || Vj -- Vl ||<c 

for every j, I > L. Note that every convergent sequence is automatically a 
Cauchy sequence, by a simple argument. 

If every Cauchy sequence of elements of V converges to an element of V, 
then we say that V is complete. This is equivalent to the complencss of V as a 
metric space, with respect to the metric 

(6.4) d(v, w) = \\v — w\\ 

corresponding to the norm ||u|| on V. If V is complete in this sense, then we 
say that V is a Banach space. If (v,w) is an inner product on V, and if V is 
complete with respect to the associated norm 

(6.5) |M| = M 1/2 , 
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then we say that V is a Hilbert space. 

Of course, the real and complex numbers are complete with respect to their 
standard norms, as in Section 1.1. If V = R" or C" for some positive integer 
n, and \\v\\ is one of the norms \\v\\ p defined in Section 2.1, 1 < p < oo, then it 
is easy to see that V is complete, by reducing to the n = 1 case. If V = R n or 
C equipped with any norm \\v\\, then one can also check that V is complete 
with respect to \\v\\, by reducing to the case of the standard norm on V using 
the remarks at the end of Section 2.1. This implies that any finite-dimensional 
vector space V over the real or complex numbers is complete with respect to 
any norm \\v\\ on V, because V is isomorphic to R™ or C™ for some positive 
integer n, or V = {0}. If E is a nonempty set and V = £ P (E, R) or £ P (E, C) for 
some 1 < p < oo, then V is complete with respect to the corresponding norm 
\\f\\p, as in Section 5.6. 

Let V be a real or complex vector space with a norm \\v\\. A subset W of V 
is said to be a closed set in V if for every sequence {wj}°^ 1 of elements of W 
that converges to an element w of V, we have that w E W. This is equivalent 
to other standard definitions of closed sets in metric spaces, in terms of a set 
containing all of its limit points, or being a closed set when the complement 
is an open set. If V is complete with respect to \\v\\, and W is a closed linear 
subspace of V, then it follows that W is complete with respect to the restriction 
of the norm \\v\\ to v E W. This is because a Cauchy sequence in W is also 
a Cauchy sequence in V in this situation, which converges to an element of V 
when V is complete, and the limit is in W when W is a closed set in V. 

In particular, if E is a nonempty set, then cq{E, R) and cq(E, C) are closed 
linear subspaces of £°°(E, R) and 1°°{E,C), as in Section 5.6. Hence Co(E,H) 
and Co(E,R) are complete with respect to the £°° norm, as in the preceding 
paragraph, because £°°(E,IL) and £°°(E,C) are complete. 

Let V be the vector space of continuous real or complex-valued functions 
on the closed unit interval [0,1], with respect to pointwise addition and scalar 
multiplication. Remember that continuous functions on [0, 1] are automatically 
bounded, because [0, 1] is compact. Thus 



defines a norm on V, known as the supremum norm. It is well known that 
V is complete with respect to this norm, because of the fact that the limit of 
a uniformly-convergent sequence of continuous functions is also continuous. If 
1 < p < oo, then 



also defines a norm on V, because of Minkowski's inequality in Section 1.3. 
However, it is well known that V is not complete with respect to this norm for 
any p < oo. To get a complete space, one can use Lebesgue integrals. Similarly, 
the vector space of bounded continuous real or complex-valued functions on 
any topological space is complete with respect to the corresponding supremum 



(6.6) 



11/11 = sup \f(x)\ 



0<a;<l 



(6.7) 
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norm. There are also L p spaces associated to any measure space, which are 
Banach spaces when p > 1, and Hilbert spaces when p = 2. Note that the t v 
spaces discussed in the previous chapter may be considered as LP spaces with 
respect to counting measure on a set E. 



6.2 Sequences and series 

Let V be a real or complex vector space with a norm ||u||. If {vj}°^L 1 , {wj} ( j^ 1 
are sequences in V that converge to u,ui <E V, respectively, then 

(6.8) lim (vj +wj) = v + w. 

This can be shown in essentially the same way as for sequences of real or complex 
numbers. Similarly, if {tj}jl 1 is a sequence of real or complex numbers, as 
appropriate, that converges to the real or complex number t, and if {vj}j2. 1 is 
a sequence of vectors in V that converges to v E V, then 

(6.9) lim tj Vj = tv. 

j->oc 

Equivalently, this means that addition and scalar multiplication are continuous 
on V with respect to the metric associated to the norm. 
As in (2.7) in Section 2.1, one can check that 

(6.10) ||H| - |MI| < \\v-w\\ 

for every v, w £ V, using the triangle inequality. If {vj}°^ 1 is a sequence in V 
that converges to v G V, then it follows that 

(6.11) lim \\ Vj \\ = \\v\\, 

j->oo 

as a sequence of real numbers. This is the same as saying that \\v\\ is a continuous 
real- valued function on V with respect to the metric associated to the norm. 

Let YlJLj dj be an infinite series whose terms aj are elements of V . As 
usual, we say that Y^jLi a j converges in V if the corresponding sequence of 
partial sums X)j=i a j converges in V, in which case we put 

oo n 

(6.12) J2 a i = li ™ Y^ a i- 

If S^Li a j an d ll'jLi bj are convergent series with terms in V, then it is easy 
to see that Y^jLi{ a o + bj) a l so converges, and that 

oo oo oo 

(6.13) Y.".t ■ Y. h >- 
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because of the corresponding fact about sums of convergent sequences mentioned 



earlier. Similarly, if ^ 



,1 3 



is a convergent series with terms in V , and if t is 



a real or complex number, as appropriate, then * a j al so converges, and 



(6.14) 



An inhnitc scries Y^jLi a j wrt h terms in V is said to converge absolutely if 
(6.15) 



OO 

E 



converges as an infinite series of nonncgative real numbers. In this case, one can 
show that the partial sums of J^JLi a j f° rrn a Cauchy sequence, because 



(6.16) 



E a -> 

3=1 



< E K 

3=1 



for every n > I > 1. If V is complete, then it follows that X^jli a j converges in 
V. We also get that 



(6.17) 



OO 

E< 



< 



Conversely, suppose that {^1^! is a Cauchy sequence of elements of V. It 
is easy to see that there is a subsequence {vj l }fl 1 of {vj}°^L 1 such that 



(6.18) 

for each I, so that 
(6.19) 



\v j, — v 



31 + 1 



< 2" 



converges absolutely. Of course, 

n 

(6-20) 5>, ; - 



i+i ) 



u 3n+l 



1 = 1 



for each n, which implies that the series (6.19) converges in V if and only if 
{vj l }fl 1 converges in V. If {vj,}fl 1 converges in V, then one can check that 
{ v j}j=i also converges to the same element of V, because {vj}J^ 1 is a Cauchy 
sequence. If every absolutely convergent series in V converges, then it follows 
that every Cauchy sequence in V converges, which is to say that V is complete. 

Suppose now that the norm ||u|| on V is associated to an inner product 
(v, w) on V in the usual way. Let X^Li a j be an infinite series whose terms are 
pairwise-orthogonal vectors in V, in the sense that 



(6.21) 



(aj,a k ) = 
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when j 7^ k. In this case, 
(6.22) 



E a ^ = E 



for each n. If X^^Li a j converges in V, then X^jli ll a il| 2 converges in R, and 

OO 2 OO 



(6.23) 



E°i =E 



a.j 



3=1 i=i 

Of course, the orthogonality condition (6.21) implies that 



(6.24) 



E^- =Eii°iii 5 



for every n > I > 1. If X^jli ll a jl| 2 converges, then it follows that the partial 
sums of J2JLi a j form a Cauchy sequence in V. If V is complete, then X^jli a j 
converges in F under these conditions. 



6.3 Minimizing distances 



Let V be a real or complex vector space with an inner product (v, w), and let 
||w|| be the corresponding norm on V. Also let £ be a nonempty subset of V, 
let v be an element of V, and put 

(6.25) p = mf{\\v - w\\ : w G E}. 

Thus for each positive integer j there is a Wj G E such that 



(6.26) 



1 

\v -Wj\\ < p + -. 



Note that 
(6.27) 



x + y 


2 


x-y 


+ 


2 


2 



2(IWI 2 + IMI 2 ) 



for every x, y G V, by applying the parallelogram law (2.90) to x/2, y/2. If we 
take x — v — wj and y = v — wi, then we get that 



(6.28) 



/ Wj +Wj\ 



+ \ \\ W 3 ~ W l\ 



l^V-Wjf + Wv-Wif). 



Combining this with (6.26) gives 



(6.29) 



(^) 2 + JiK-^II 2 <p 2 + p( 



i i 
.7 + 7 



2\f + I 2 )' 
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Suppose now that E is a convex set in V. This implies that 

Wj + Wi 



(6.30) 

for each j, I, and hence that 
(6.31) 

In this case, we get that 



G E 



Wl 



>p- 



(6.32) 



4 \m-wi\ 



(\ 1\ 1 / 1 1 



for each j, L Thus {tOj}^ is a Cauchy sequence in V under these conditions. 

If V is complete, then it follows that {wj}j2. 1 converges to an element to of 
V. If E is a closed set in V, then to G E. Moreover, 

(6.33) ||t; - to|| = p, 

so that to minimizes the distance to v among elements of E. 

This argument also works in a large class of Banach spaces. More precisely, 
a norm \\v\\ on V is said to be uniformly convex if for each e > there is a 
(5(e) > such that 5(e) < 1 and 



(6.34) 

for every u, z £ V such that 

means that 

(6.35) 



u + z 



u = 



<1-S(e) 



— \\z\\ — 1 and || it — z\\ > e. Equivalently, this 



< e 



for every u, z e V such that ||u|| = ||z|| = 1 and \\(u + z)/2\\ > 1 - 5(e). If 
|| f || is associated to an inner product on V, then it is easy to see that ||f|| 
is uniformly convex, using the parallelogram law. It is well known that the 
L p norm is uniformly convex when 1 < p < oo, as a consequence of famous 
inequalities of Clarkson. If ||u|| is uniformly convex, then one can modify the 
previous arguments to show that the minimum of the distance from a point 
v G V to a nonempty closed convex set E C V is attained when V is complete. 
As before, the main step is to show that a minimizing sequence {wj}°fL l is a 
Cauchy sequence when ||v|| is uniformly convex. 



6.4 Orthogonal projections 

Let V be a real or complex vector space with an inner product (v,w), and let 
||f|| be the corresponding norm on V. Suppose that V is complete, so that V is 
a Hilbert space, and let W be a closed linear subspace of V. If v is any element 
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of V, then there is a w G W whose distance to v is minimal among elements of 
W, as in the previous section. Equivalently, 

(6.36) \\v - w\\ <\\v-w + z\\ 

for every z G W, because W is a linear subspace of V. Using the inner product, 
we get that 

(6.37) \\v-w\\ 2 < \\v-w + z\\ 2 

= \\v - w\\ 2 + (v - w, z) + (z, v — w) + \\z\\ 
= \\v - w\\ 2 + 2 Re(-y - w,z) + \\z\\ 2 . 

More precisely, it is not necessary to take the real part of (v — w, z) in the last 
step when V is a real vector space, but this is needed when V is complex. Of 
course, this inequality reduces to 

(6.38) 0<2Re(w-w,z) + ||z|| 2 , 

by subtracting \\v — w\\ 2 from both sides. 
Let t be a real number, and put 

(6.39) f(t) = 2 Re(v-w,tz) + \\t z\\ 2 = 2t Rc(v - w,z) +t 2 \\z\\ 2 . 

If z e W, then tz G W, and hence f(t) > 0, by (6.38). Thus the minimum of 
f(t) is attained at t = 0, which implies that the derivative of f(t) at t = is 
also equal to 0. This shows that 

(6.40) Re(v-w,z) = 
for every z which is the same as saying that 

(6.41) (v-w,z) = 

for every z G W in the real case. In the complex case, one can get (6.41) by 
applying (6.40) to z and to i z. 
Conversely, (6.41) implies that 

(6.42) \\v-w + z\\ 2 = \\v - w\\ 2 + \\z\\ 2 

for every z G W, and hence that w minimizes the distance to v among elements 
of W. As in Section 2.6, w G W is uniquely determined by the condition 
that (6.41) holds for every z G W. Put Pw{v) — w, which is the orthogonal 
projection of v onto W. Note that 

(6.43) \\v\\ 2 = \\v-P w {v)\\ 2 + \\P w {v)\\ 2 , 
which is the same as (6.42) with z — w. In particular, 

(6.44) ||fV(tO|| < \H 
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for every v G V. 

If vi,v 2 G V, then P w (vi) + Pw{v 2 ) G and 

(6.45) («i + «2) - (iV(«i) + ^V(«2)) = («i - iV(«i)) + («2 - Pw(v 2 )) 

is orthogonal to every element of W, by the corresponding properties of P\v{ v i) 
and Pw(v 2 )- This implies that 

(6.46) Pw(vi+v 2 ) = P w (v 1 ) + P w (v 2 ), 

because Pw{v\ + v 2 ) is characterized by these conditions, as in the preceding 
paragraph. Similarly, 

(6.47) Pw(tv) = tP w (v) 

for every v G V and t G R or C, as appropriate, because tPw{v) G W and 

(6.48) tv-tP w {v) =t{v-P w (v)) 

is orthogonal to every element of W, by the corresponding properties of Pw{v). 
Thus Pw{ v ) is a linear mapping from V into VK. Of course, Pw( v ) — v when 
u G W. 



6.5 Orthonormal sequences 

Let V be a real or complex vector space with an inner product (v, w) again, 
and let ||u|| be the corresponding norm on V. Suppose that e\,e 2 , es, . . . is an 
infinite sequence of orthonormal vectors in V, so that 

(6.49) (ej,e k )=0 

when j ^ k, and ||ej|| = 1 for each j. Note that any sequence of vectors in 
V can be modified to get an orthonormal sequence with the same linear span, 
using the Gram-Schmidt process. This was already used in Section 2.6 to show 
that every finite-dimensional inner product space has an orthonormal basis. 
Put 

n 

(6.50) P n (v) = J2(v,e j )e j 

3 = 1 

for each v G V and positive integer n. This is the same as the orthogonal 
projection of V onto the linear subspace W n spanned by e\, . . . , e„ in V, as in 
Section 2.6. Remember that 

(6.51) M| 2 = ||P»|| 2 + || V - P»|| 2 = ]T \(v, ej }| 2 + \\v P n {v)\\\ 
as in (2.78). In particular, 

(6.52) >:\(r.rj}f £ \r\\ 



n 

E 
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for every v £ V and n > 1. Remember also that P„(v) £ minimizes the 
distance to w among elements of W n , as in Section 2.11. 

Note that U^Li Wn is a linear subspace of V, because W„ is a linear subspace 
of V for each n, and because W n C W n +i for each n, by construction. Let W 
be the closure of U«Li Wn m ^ which is the set of v £ V with the property 
that for each e > there is a w £ U^=i sucn that 

(6.53) \\v-w\\<e. 

Thus (J^-! W„ C W automatically and one can check that W is a closed linear 
subspace of V. 

Equivalently, v £ V is an element of W if and only if 

(6.54) lim \\v-P n (v)\\ =0. 

n— >oo 

More precisely if v satisfies (6.54), then it is easy to see that v £ W, because 
P n (v) £ W n for each n. Conversely, suppose that v £ W, and let e > be given. 
By definition of W, there is a positive integer k and a to £ Wk such that (6.53) 
holds. This implies that 

(6.55) ||w-P n (i;)||<||w-«;||<c 

for every n > k, as desired, because Wk £ W n for each n > fc, and because P rl (w) 
minimizes the distance to v among elements of W n , as before. 
We would like to put 

oo 

(6.56) P(v) = V (v, e ? ) e ? = lim P n (v) 

for every v £ V, but we need to be careful about the existence of the limit. If 
v £ W, then (6.54) implies that {P n (v)}%L 1 converges to v, so that the definition 
of P(v) makes sense and P(v) — v. Otherwise, if v is any element of V, then 

oo 

(6.57) El^^)| 2 

converges and is less than or equal to ||w|| 2 , because of (6.52). If V is complete, 
then it follows that the series in (6.56) converges, as in Section 6.2. In this case, 
it is easy to see that P(v) £ W for every v £ V, because P n (v) £ W n for each 
n. One can also check that P is a linear mapping from V into W under these 
conditions. By construction, we also have that 

oo 

(6.58) ITOI| 2 = £lvW| 2 <H 2 



for every v £ V. 
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Let us suppose from now on in this section that V is complete, and thus a 
Hilbert space. Observe that 

(6.59) (P(v), ei) = lim (P n (v), e,) = (v, e,> 

n— t-oo 

for every i/eF and i > 1. This uses the fact that 

(6.60) (P n (u),ei> = (u,e,> 

for each n>l, because of the orthonormality of the ej 's. This also implicitly uses 
the Cauchy-Schwarz inequality, in order to take the limit outside of the inner 
product, which is basically the same as the continuity of the inner product with 
respect to the associated norm. It follows that v — P(v) is orthogonal to e; for 
each I, which implies that v — P(v) is orthogonal to every element of U^°=i 
because of the linearity properties of the inner product. Using continuity of the 
inner product again, we get that v — P(v) is orthogonal to every element of the 
closure W of U^=i Wn- As in Section 2.6, P(v) is uniquely determined by the 
conditions that P(v) € W and v — P(v) is orthogonal to every element of W, 
and hence P is the same as the orthogonal projection P\y of V onto W, as in 
the previous section. 

If { a j}JLi is an Y sequence of real or complex numbers, as appropriate, such 
that Y^jLi \ a j\ 2 converges in R, then the same arguments show that 

oo 

(6.61) J2 a i e 3 

3=1 

converges in V to an element of W, and that 



(6.62) 



3 = 1 



3 = 1 



If {bj}f=i is another sequence of real or complex numbers, as appropriate, such 
that Yl'jLi \bj\ 2 converges, then one can check that 

oo oo oo 

(6.63) ( J2 a o e 3 . J2 bk ek ) = J2 a i b i 

3 = 1 fe=l 3 = 1 

in the real case, and 

oo oo oo 

(6.64) ( a 3 e 3 ^ b k ek ) = Y a i ^ 

3 = 1 k=l j=l 

in the complex case, using the orthonormality of the e^'s and the continuity 
properties of the inner product, as before. Note that the infinite series on the 
right sides of (6.63) and (6.64) are absolutely convergent under these conditions, 
as in Section 5.5. If V = W, then the e/s are said to form an orthonormal basis 
of V. In this case, we get a natural isomorphism between V and £ 2 (Z + , R) or 
£ 2 (Z+, C), as appropriate, associated to this orthonormal basis for V. 
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6.6 Bounded linear functionals 

Let V be a real or complex vector space with a norm ||i>||. A linear functional 
A on V is said to be bounded if there is a nonnegative real number C such that 

(6.65) |A(v)| < C\\v\\ 

for every v G V. If V has finite dimension, then every linear functional on V is 
bounded, as in Section 2.2. If A is a bounded linear functional on V, then it is 
easy to see that A is continuous on V with respect to the metric associated to 
the norm. In particular, if {vj}°^ 1 is a sequence of vectors in V that converges 
to another vector v G V, as in Section 6.1, then it is easy to see that 

(6.66) lim X( Vj ) = \{v) 

j'-»oo 

in R or C, as appropriate. Conversely, one can check that a linear functional 
A on V is bounded when it is continuous at 0. Note that continuity of a linear 
functional on V at implies continuity at every point in V, by linearity. 

Suppose for the moment that V is equipped with an inner product (v,w), 
and that ||z;|| is the norm associated to this inner product. liw&V, then 

(6.67) X w (v) = (v,w) 
defines a bounded linear functional on V, since 

(6.68) \X w (v)\ < \\v\\ |H| 

for every v G V, by the Cauchy-Schwarz inequality. Conversely, if V is complete, 
and if A is a bounded linear functional on V, then there is a unique w G V such 
that X(v) = X w (v) for every v G V. The uniqueness of w is a simple exercise 
that does not use the completeness of V, and so we proceed now to the proof of 
the existence of w. This is trivial when X(v) = for every v G V, and hence we 
suppose that X(v ) ^ for some v G V. 

Let Z be the kernel of A, which is to say that 

(6.69) Z = {v G V : X(v) = 0}. 

It is easy to see that Z is a closed linear subspace of V, because of the continuity 
of A that follows from boundedness. Thus the orthogonal projection Pz of V 
onto Z may be defined as in Section 6.4. Consider 

(6.70) wo = v Q -P z (vq). 

Note that wo ^ 0, since v £ Z by hypothesis, and that w is orthogonal to 
every element of Z, as in Section 6.4. In particular, 

(6.71) (v ,w ) = (v Q - P z (vo),wo) = (wq,w q ) = \\w \\ 2 > 0. 

Put w = X(vq) \\wo\\~ 2 wo in the real case, and w = X(v ) \\w \\~ 2 w in the 
complex case. By construction, X w (vo) = X(v ), and X w (z) = for every z G Z. 
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This implies that \ w (v) = X(v) for every v £ V, as desired, because V is spanned 
by t>o and Z in this situation. 

Let V be a real or complex vector space with an arbitrary norm \\v\\ again. 
Suppose that W is a linear subspace of V, and that A is a linear functional on W 
that satisfies (6.65) for some C > and every v £ V. Under these conditions, 
the Hahn-Banach theorem states that there is an extension of A to a linear 
functional on V that satisfies (6.65) for every v £ V, with the same constant 
C. If V is finite-dimensional, then this is the same in essence as Theorem 2.26 
in Section 2.3. Otherwise, there is an argument using the axiom of choice, with 
the previous construction as an important part of the proof. In some situations, 
one can use a sequence of extensions as before to extend A to a dense linear 
subspace of V, and then extend A to all of V using continuity. At any rate, 
an important consequence of the Hahn-Banach theorem is that for each v £ V 
with v ^ there is a bounded linear functional A on V such that X(v) ^ 0. 
More precisely, one can first define A on the 1-dimensional linear subspace of V 
spanned by v, and then use the Hahn-Banach theorem to extend A to a bounded 
linear functional on all of V. 



6.7 Dual spaces 

Let V be a real or complex vector space with a norm ||w|| again, and let V* be 
the space of all bounded linear functionals on V. This is also a vector space over 
the real or complex numbers in a natural way, because the sum of two bounded 
linear functionals on V is also bounded, as is the product of a bounded linear 
functional on V by a scalar. If A is a bounded linear functional on V, then the 
dual norm ||A||* of A is defined by 

(6.72) ||A||* = sup{|A(t;)| :veV, \\v\\<l}. 

This is the same as (2.11) in Section 2.2, except that now we need to ask that A 
be a bounded linear functional on V to ensure that the supremum is finite. As 
before, A satisfies (6.65) with C — ||A||*, and this is the smallest value of C for 
which (6.65) holds. 

It is easy to see that ||A||* is a norm on V* , as in the finite-dimensional case. 
Let us check that V* is automatically complete with respect to the dual norm. 
Let {Xj}jt 1 be a sequence of bounded linear functionals on V which is a Cauchy 
sequence with respect to the dual norm. This means that for each e > there 
is an L(e) > 1 such that 

(6.73) HA, — Adl* < e 
for every j, I > L(e), and hence 

(6.74) |Aj(«)-A,(«)|<e|H| 

for every v £ V and j, I > L(e). In particular, {Xj(v)}° c L 1 is a Cauchy sequence 
of real or complex numbers, as appropriate, for each v £ V. Thus {Xj(v)}°^ 1 
converges to a real or complex number X(v) for each v £ V, by the completeness 
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of R, C. One can check that A defines a linear functional on V, because Xj is 
linear on V for each j. We also have that 

(6.75) |A»-A(t,)|<eH 

for every v G V and j > L(e), by taking the limit as I — >• oo in (6.74). Applying 
this with e — I and j = L(l), we get that 

(6.76) |A(«)|<|A i(1) H| + |H|<(||A i{1) |r + l)H 

for every v G V, so that A is a bounded linear functional on V. Using (6.75) 
again, we get that {Xj}°^ 1 converges to A with respect to the dual norm, as 
desired. 

Let V** be the space of bounded linear functionals on V*, with respect to 
the dual norm ||A||* on V*. If v G V, then 

(6.77) L v (\) = \(v) 
defines a linear functional on V* , which satisfies 

(6.78) \L V (X)\ = \X(v)\ < ||A||*H 

for every A G V* , by the definition of ||A||*. This implies that L v is a bounded 
linear functional onV*. More precisely, if ||£||** is the dual norm of a bounded 
linear functional L on V* with respect to the dual norm ||A||* on V* , then (6.78) 
implies that 

(6.79) ||L„||" < || V || 

for every v G V. Using the Hahn-Banach theorem, one can check that 

(6.80) ||L„||" = IMI 

for every v G V. The main point is to show that if v ^ 0, then there is a A G V* 
such that ||A||* = 1 and \(v) — ||u||, so that equality holds in (6.78). As usual, 
one can start by defining A on the 1-dimensional subspace of V spanned by v, 
and then extend A to all of V using the Hahn-Banach theorem. 

A Banach space V is said to be reflexive if every bounded linear functional 
on V** is of the form L v for some v G V. Note that V has to be complete for this 
to hold, since we already know that V** is complete, because it is a dual space. 
It is easy to see that Hilbert spaces are reflexive, using the characterization of 
their dual spaces in the previous section. It is also well known that L p spaces 
are reflexive when 1 < p < oo, because the dual of IP can be identified with the 
corresponding L q space, where 1 < q < oo is conjugate to p in the usual sense 
that l/p + 1/q = I. In particular, l v spaces are reflexive when 1 < p < oo, by 
the characterization of their dual spaces in Section 5.7. We also saw in Section 
5.7 that the dual of c (E) can be identified with ^{E) for any nonempty set 
E, and that the dual of £ 1 (E) can be identified with t°°{E). If E is an infinite 
set, then c (E) is a proper linear subspace of £°°(E), and it follows that c (E) 
is not reflexive. 
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6.8 Bounded linear mappings 

Let V\ and V 2 be vector spaces, both real or both complex, and equipped with 
norms || • ||i, || • ||2, respectively. A linear mapping T from V\ into Vi is said to 
be bounded if 

(6.81) ITOih^cini! 

for some C > and every v G V\. If V\ has finite dimension, then one can 
check that every linear mapping from V\ into V 2 is bounded, using a basis for 
V\ and the remarks at the end of Section 2.1 to reduce to the case where V\ is 
R™ or C™ equipped with the standard norm. If V 2 = R or C, as appropriate, 
then a bounded linear mapping from V\ into V2 is the same as a bounded 
linear functional on V\. The boundedness of any linear mapping is equivalent 
to suitable continuity conditions, as in the context of linear functionals. 

Let BjC(Vi, V2) be the space of bounded linear mappings from V\ into V2. It 
is easy to see that this is a vector space with respect to pointwise addition and 
scalar multiplication. If T is a bounded linear mapping from V\ into V2, then 
the operator norm of T is defined by 

(6.82) ||T|| op = sup{||2»|| 2 : v G V u ||v||i < 1}, 

as in (2.42) in Section 2.4. Equivalently, (6.81) holds with C = \\T\\ op , and 
this is the smallest value of C for which (6.81) holds. One can check that (6.82) 
defines a norm on B£(Vi, V2). If V2 = R or C, as appropriate, then the operator 
norm reduces to the dual norm on (V\)* defined in the previous section. If V2 
is any vector space which is complete with respect to the norm || • || 2 , then one 
can show that BC(V\, V2) is complete with respect to the operator norm, in the 
same way as for dual spaces. 

Let V3 be another vector space, which is real or complex depending on 
whether V\ and V2 are real or complex, and let || • || 3 be a norm on V3. If 
T\ is a bounded linear mapping from V\ into V2, and T 2 is a bounded linear 
mapping from V 2 into V3 , then it is easy to see that their composition T 2 o T\ is 
a bounded linear mapping from V\ into V3. Moreover, 

(6-83) UTaoTiHop.is < HTiH^ia ||T 2 || op , 23 , 

where || • || op ,a6 is the operator norm for a linear mapping from V a into with 
a, 6= 1,2,3.' 

A bounded linear mapping T : V\ — > V2 is said to be invertible if it is a one- 
to-one linear mapping from V\ onto V 2 whose inverse T -1 is bounded as a linear 
mapping from V2 into V\ . Note that the composition of invertible mappings is 
also invertible. If T is invertible, then 

(6.84) ||T(«)||2>c||«||i 

for some c > and every v € V\ . More precisely, this holds with c equal to the 
reciprocal of the operator norm of T^ 1 , by applying the boundedness of T" 1 to 
T _1 (T(w)) = v. Conversely, suppose that T is a bounded linear mapping from 



90 



CHAPTER 6. BANACH AND HUBERT SPACES 



V\ into V2 that satisfies (6.84). In particular, v — when T(v) = 0, so that T 
is one-to-one. If T maps Vi onto V2, then (6.84) implies that T -1 is bounded, 
with operator norm less than or equal to 1/c. 

If Vi is complete and T : Vi — > V2 is a bounded linear mapping that satisfies 
(6.84), then it is easy to see that T(Vi) is also complete. This is because a 
sequence {vj}°^L 1 of elements of Vi is a Cauchy sequence in Vi if and only if 
{T(vj)}° < iL 1 is a Cauchy sequence in V2, and converges to v € Vi if 

and only if {T(vj)}°^ 1 converges to T{v) in V2. In this case, it follows that 
T(Vi) is a closed linear subspace in Vi- To see this, let {vj\f=\ be a sequence 
of elements of Vi such that {T(vj)}°^ 1 converges to some z € V2, and let us 
check that 2 = T(v) for some d 6 Vi. Note that {T(u J )}°^ 1 is a Cauchy 
sequence in V2, since it converges in V2. As before, this implies that {vj}J^ 1 is 
a Cauchy sequence in Vi, so that converges to some v £ V, because V 

is complete. Thus {T(vj)}°^ 1 converges to T(v) in V2, because T is bounded, 
and hence z = T(v), as desired. 

Let V be a real or complex vector space with a norm and let T be a 
bounded linear operator on V. If j is a positive integer, then let be the 
composition of j T's, so that T 1 = T, T 2 = T o T, and so on. It will be 
convenient to interpret as being the identity operator / on V when j = 0. 
Observe that 



TO TO 

(6.85) (I - T) ( ^ T J ) = ( ^ T J ) (J - T) = I - T n+1 



for each nonnegative integer n, as in the case of ordinary geometric series of real 
and complex numbers. Of course, 

(6-86) \\T j \\o P <\\T\\i p 

for each j, by (6.83). If ||T|| op < 1, then we get that 

oo oo 

(6-87) ^| r || op <^|| T |p op = 



- T 

3=0 3=0 11 



op 



by the usual formula for the sum of an geometric series. 
This shows that the infinite series 

oo 

(6.88) J2 Ti 

3=0 

converges absolutely in the vector space BC(V) = BC(V, V) of bounded linear 
operators on V when ||T|| op < 1. If V is complete, then we have seen that 
BC(V) is complete with respect to the operator norm, and hence that (6.88) 
converges in BC(V). We also get that 

oo oo 

(6.89) {I-T)(JT = ( T j ) (I-T) = I, 

3=0 3=0 
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by taking the limit as n — >• oo in (6.85), and using the fact that T n+1 — > as 
n — > oo when ||T|| op < 1. Thus I — T is invcrtiblc on V when ||T|| op < 1 and V 
is complete. 

Let us continue to ask that V be complete. If T is any bounded linear 
operator on V, and A is a real or complex number, as appropriate, such that 
|A| > lollop, then XI — T is invertible on V. This follows from the preceding 
argument applied to A -1 T. Similarly, if R is a bounded linear operator on V 
which is also invertible, and if T is a bounded linear operator on V that satisfies 

(6.90) ||ii- 1 || op ||T|| op <l, 
then 

(6.91) R — T = R(I — R~ x T) 
is invertible on V. 

Suppose that V is a complex Banach space, and let T be a bounded linear 
operator on V. The spectrum of T is the set of complex numbers A such that 
A I — T is not invertible on V. As usual, eigenvalues of T are elements of the 
spectrum, but the converse does not hold in infinite dimensions. Note that 

(6.92) |A| < ||T|| op , 

for every A 6 C in the spectrum of T, as in the preceding paragraph. If A G C is 
not in the spectrum of T, so that XI — T is invcrtiblc on V, then \x I — T is also 
invertible for every complex number fj, sufficiently close to A, by the remarks in 
the previous paragraph. This implies that the spectrum of T is a closed set in 
the complex plane. A famous theorem states that the spectrum of T is always 
nonempty. The main idea in the proof is that otherwise (XI — T) _1 would be 
a holomorphic function of A on the complex plane that tends to as |A| — > oo. 

Let Vi and Vi be Banach spaces, both real or both complex, and with norms 
| • ||i, || • || 2, respectively. Also let 

(6.93) B 1 = {v e Vi : ||v||i < 1} 

be the closed unit ball in V\. A linear mapping T from V\ into Vi is said to be 
compact if the closure of T(B{) in Vi a compact set. This is equivalent to asking 
that T[B\) be totally bounded in Vi, which means that for each e > 0, T(B\) 
can be covered by finitely many balls of radius e in Bi. In particular, totally 
bounded sets are bounded, and hence compact linear mappings are bounded. 
It is easy to see that bounded subsets of finite-dimensional spaces are totally 
bounded, so that bounded linear mappings of finite rank are compact. One can 
also check that the composition of a bounded linear mapping with a compact 
linear mapping is compact, where the compact operator is either first or second 
in the composition. 

Let CC(V\ , Vi) be the space of compact linear mappings from V\ into Vi. This 
is a linear subspace of the vector space B£(V\, Vi) of bounded linear mappings 
from Vi into Vi, which is closed with respect to the operator norm on B£(Vi,Vi). 
This means that if {Tj}°^L 1 is a sequence of compact linear mappings from V\ 
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into V<i that converges to a bounded linear mapping T : V\ — >• V2 with respect to 
the operator norm, then T is compact too. In particular, T is compact if it is the 
limit of a sequence of bounded linear mappings of finite rank with respect to the 
operator norm. In some situations, including mappings between Hilbert spaces, 
one can show that every compact linear mapping is the limit of a sequence of 
bounded linear mappings of finite rank with respect to the operator norm. 

Let T be a compact linear mapping from a Banach space V into itself. If 
A is a real or complex number, as appropriate, such that A ^ and A / — T is 
not invertiblc, then it can be shown that A is an eigenvalue of T, and that the 
corresponding eigenspace is finite-dimensional. It can also be shown that for 
each r > 0, there are only finitely many eigenvalues A with |A| > r. 



6.9 Self-adjoint linear operators 

Let V be a real or complex vector space with an inner product (v,w), and 
suppose that V is complete with respect to the corresponding norm ||u||, so that 
V is a Hilbert space. As in Section 3.3, a bounded linear operator A on V is 
said to be self-adjoint if 

(6.94) (A(v),w) = (v,A(w)) 

for every v, w € V. As before, the identity operator I on V is self-adjoint, as is 
the orthogonal projection P\y of V onto a closed linear subspace W of V. The 
sum of two bounded self-adjoint linear operators on V is also self-adjoint, and 
the product of a bounded self-adjoint linear operator on V and a real number 
is self-adjoint too. 

Suppose for the moment that V is a complex Hilbert space. If A is a bounded 
self-adjoint linear operator on V, then 



(6.95) (Mv),v) = (v, A(v)) = (A(v), v) 

for every v (zV, and hence 

(6.96) ' (A( v )> v ) e R 

for every v G V. Using this, it is easy to see that the eigenvalues of A are 
real numbers, as before. Let us check that the spectrum of A, as defined in 
the previous section, is also contained in the real line under these conditions. 
Equivalently, this means that A I— A is invertible on V for every complex number 
A with nonzero imaginary part. 
Observe that 

(6.97) lm{(XI - A)(v),v) = (ImA)||wf 
for every v € V, by (6.96), so that 

(6.98) \((XI-A)(v),v)\>\Im\\\\v\\ 2 

for every v G V. The Cauchy-Schwarz inequality implies that 

(6.99) \((XI-A)(v),v)\<\\(XI-A)(v)\\\\vl 
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from which we get that 

(6.100) ll(A/- A)(v)\\ > |ImA| ||v|| 

for every v G V. This is the same type of condition as (6.84) in the previous 
section, since Im A ^ 0, by hypothesis. In order to show that A I — A is invertible 
on V, it suffices to check that XI — A maps V onto itself. 

Suppose for the sake of a contradiction that W = (XI — A)(V) is a proper 
linear subspace of V . Note that If is a closed linear subspace of V, because 
of (6.100) and the completeness of V, as in the previous section. Let v be any 
element of V\W, and put 

(6.101) y = v-P w {v), 

where Pw(v) is the orthogonal projection of v onto W , as in Section 6.4. Thus 
y ^ 0, because v £ W and Pw( v ) € W, and y is orthogonal to every element of 
W. The latter condition is the same as saying that 

(6.102) ((XI-A)(v),y)=0 

for every v E V. In particular, we can apply this to v = y, to get that 

(6.103) ((XI-A)(y),y)=0. 

This implies that y — 0, by (6.98), contradicting the hypothesis that y ^ 0. It 
follows that W — V, so that A I — A is invertible on V, as desired. 

Let V be a real or complex Hilbcrt space again. A bounded self-adjoint 
linear operator A on V is said to be nonnegative if 

(6.104) (A(v),v)>0 

for every v E V. Suppose that A satisfies the strict positivity condition that 

(6.105) (A(v),v) > c\\vf 

for some c > and every v EV, and let us check that A is invertible on V. By 
the Cauchy-Schwarz inequality, 

(6.106) <A(v),t;><||A(t;)||H 

for every v E V, and hence 

(6.107) ' ||A(«)||> C |H| 

for every v E V . This is the same as (6.84) in this context, and it suffices to 
show that A maps V onto itself. 

As before, W = A(V) is a closed linear subspace of V under these conditions. 
If W V, then there is a y E V such that y ^ and y is orthogonal to every 
element of W. Equivalently, this means that 



(6.108) 



(A(v),y)=0 
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for every v G V, and for v = y in particular, so that (A(y), y) = 0. This implies 
that y = 0, by the strict positivity of A, contradicting the hypothesis that y ^ 0. 
Thus A(V) = V, and hence A is invertible on V, as desired. 

Similarly, if A is a bounded self-adjoint linear operator on V that satisfies 
(6.107) for some c > and every v £ V, then A is invertible on V. As before, 
W — A(V) is a closed linear subspace of V under these conditions, and we want 
to show that W — V. Otherwise, there is a y <G V such that y ^ and y is 
orthogonal to every element of W, so that 

(6.109) (v,A(y)) = (A(v),y)=0 

for every v E V. This implies that A(y) = 0, and hence that y = 0, because of 
(6.107). Thus -A(V) = V, so that A is invertible on V, as desired. 

In analogy with the finite-dimensional case, it can be shown that a compact 
self-adjoint linear operator T on V can be diagonalized in an orthonormal basis 
for V. Using this, one can show that any compact linear mapping between 
Hilbert spaces has a Schmidt decomposition as in Section 3.8, but perhaps with 
infinite sequences of orthonormal vectors, and coefficients Xj converging to as 
j — > oo. If the Aj's are p-summable for some p > 0, then the operator is said to 
be in the S v class. 



Chapter 7 



Marcel Riesz' convexity 
theorem 



Let (dj : k) be an n x n matrix of complex numbers, and let A(x, y) be the bilinear 
form defined for x, y 6 C" by 

n n 

(7-1) A{x,y) = Y^Y J VjO,3,kXk- 

j=i fe=i 

For 1 < p < oo, let M p be the quantity 

l^^h^yeC™, (5> fc | p ) "<1, (ElwK) "<1 , 
fe=i j=i - 1 

where p' denotes the conjugate exponent of p, 1/p+l/p' = 1. When p = 1, 
p' = oo, put 

(7.3) M 1 =aupl\A{x,y)\:x,yeC n , Y> fc | < 1, max | % | < ll, 
and when p = oo, p' = 1, set 

(7.4) M 00 =sup{\A(x,y)\:x,y eC n , max |a; fc | < 1, V \ Vj \ < l\. 

i Kk<n ^— ' 

*- j=l - 1 

Theorem 7.5 yls a function ofl/pE [0,1], logM p is convex. 
More precisely, if 1 < p < <j < oo, < £ < 1, l<r<oo, and 

(7.6) 1 * 



r p q 
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then 
(7.7) 



M r < M*M^-*. 



If M p — for some p, then A = and M p = for every p, and hence we may 
as well assume that A ^ in the arguments that follow. The special case of 
p = 1, q = oo corresponds exactly to the theorem of Schur discussed in Section 
2.5. Note that the analogous inequality holds when the a^/c's are real numbers, 
and we use i,i/6 R™ in the definition of M p , by the same proof. 
We can also describe M p as 



(7.8) M p = sup 



2. 

i=i 



n 
fe=l 



*\l/p 



x G C 



"■( 



fe=i 



i/p 



< 1 



when 1 < p < oo, and 
(7.9) Moo = sup- 



<^ max ^ aj, k x k 

Kj<n 

^ - J - k=i 



: x G C", max < 1 

Kk<n ~ 



This definition of M p is greater than or equal to the previous one by Holder's 
inequality, and for each x G C", there is a y G C ra for which equality holds and 

(£. 
or p' 

on C™ associated to the matrix (a^fc) with respect to the p-norm 
in Section 2.1. Similarly, 



1=1 \Vj\ p or maxi<j<„ t/j is equal to 1, according to whether p' < oo 
— oo. Equivalently, M p is the operator norm of the linear transformation 

defined 



lip 



(7.10) M p = sup 
when 1 < p < oo, and 

(7.11) Mi = sup 



n 

fe=i 



i/p' 



n 



1/p' 



< 1 





n 


< max 

|_ l<fc<n 





:yeC n , max |%| < 1 

1<J<TI 



which says that M p is equal to the operator norm of the dual linear transfor- 
mation on C", associated to the transpose matrix, and with respect to the dual 
norm || • || p /. One can check that M s is a continuous function of 1/s, 1/s G [0, 1], 
using the inequalities (1.80), (1.81), (1.83), and (1.84). 

As in Lemma 1.54, we would like to show that for each p,q G [1, oo] there is 
a t G (0, 1) such that (7.7) holds. Fix a real number r, 1 < r < oo, and let r' be 
its conjugate exponent. There exist x°, y° G C" at which the supremum in the 
definition (7.2) of M r is attained, i.e., which satisfy 



(7.12) 

and the normalizations 
(7.13) 



\A( X »,y»)\=M r 



n 

(£i4r) 

fe=i 



l/r 
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and 
(7.14) 



(x>rr=i- 

i=i 



This follows from standard considerations of continuity and compactness. 
Observe that 



(7.15) 



l^ u ,y u )l 



3 = 1 k=l 

,\ l/r' 



IS, 



n n r 



l/r 



j=l fe=l 



The first step uses only the definition of A. If the second equality were replaced 
with <, then it would be a consequence of Holder's inequality. If equality did 
not hold, then we could replace y° with an element of C" which satisfies (7.14) 
and for which equality does hold, increasing the value of \A(x°,y°)\. Similarly, 



(7.16) 



\A(x u ,y»)\ 



3 = 1 k=l 



= (EEfe ) (Ewr) 



fc=i j=i ;=i 

Because of the second equality in (7.15), there is a /x > such that 



(7.17) 



fe=i 



0|r'-l 



for j = 1, 2, . . . , n. This can be derived from the proof of Holder's inequality, 
by analyzing the conditions in which equality holds. Similarly, there is a v > 
such that 



= v\4r 1 



(7.i8) 

for k = 1, 2, . . . ,n. From (7.15) and (7.16), we have that 



(7.19) 



M r 



fc=l j=l 



l/r' 



using also (7.12), (7.13), and (7.14). Substituting (7.17) in the first equality, we 
get that 

(7.20) M r = n(E\W- 1) ) r - 

3 = 1 

Because r(r' — 1) = r', since l/r + l/r' = 1, we can apply (7.14) to get that 
M r = fj,. For the same reasons, M r = v. 
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Now suppose that p, q, and t are real numbers such that 1 < p < q < oo, 
< t < 1, and 1/r = t/p+ (1 —t)/q. Thus p, q' < oo, where q' is the conjugate 
exponent of q. Observe that 



(7.21) 

and 
(7.22) 



(EE^-°T /P ^^(Ei-°i p ) 



i/p 



fe=i 



n n 

(E E^° a ^ 



9 \ 1/9' 



< M, 



n 

(Eitfi 



Applying the previous computations, we get that 



(7.23) 

and 
(7.24) 



M, 



7t 

r(E^I P(r 



-1) 



1/p 



< M, 



3 = 1 



p(EI*2I 



1/9' 



1/p 



fe=l 



M r (Ei4K^) w ^ 9 (Ei^') V9 - 



A 1/9' 



fe=l i=l 

We are going to need some identities with indices. Let us first check that 



(7.25) 



'(?-;)-(' -«)(?-?)■ 



pJ \r q' 

Because 1/r = t/p+ (1 — t)/q, we have that 

'1 1\ . ,. > /- I I 



(7.26) 



\r p) V p q) 



Similarly, 1/r' = t/p' + (1 - t)/q', and 

1 1 \ , . / 1 1 



(7.27) (i -C-^(i-^(^^(i-t)t(f;). 



This proves (7.25). 
Suppose that 

(7.28) 

This implies that 
(7.29) 

by (7.25). Hence 
(7.30) 



t _ 1 -t 
r r' 

t _ 1 -t 
p q' 



r'-l = 



l-t 



and r — 1 — 



t 



t l-t 
because r (r' — 1) = r' and r' (r — 1) = r, since 1/r + 1/r' = 1. Therefore 

(7.31) p(r'-l) = gr' and g'(r-l)=p. 



99 



We can take the tth and (1 — t)th powers of (7.23) and (7.24), respectively, 
and then multiply to get 

(7.32) M r (tl^l^- 1 t/ '(El»2l fl ' (r - 1, ) (1 " tW 

j=i fe=i 

fe=i j=i 

Assuming (7.28), this reduces to 

(7.33) M r <MlM\-\ 

because the factors involving x° and y° on the left and right sides of (7.32) 
exactly match up under these conditions, by the computations in the preceding 
paragraph. To summarize, for eachp, q with 1 < p < q < oo, there is a t € (0, 1) 
such that (7.28) holds when r is given by 1/r = t/p + (1 — t)/q. For this choice 
of t, we get the inequality (7.33). Theorem 7.5 now follows from Lemma 1.54, 
with the small adaptation to functions on closed intervals. 



Chapter 8 

Some dyadic analysis 



8.1 Dyadic intervals 

Normally, a reference to "the unit interval" in the real line might suggest the 
closed interval [0, 1], but here it will be convenient to use [0, 1) instead, for minor 
technical reasons. 

Definition 8.1 The dyadic subintervals of [0, 1) are the intervals of the form 
[j 2~ k , (j + 1) 2~ k ), where j and k are nonnegative integers, and j + 1 < 2 k . In 
particular, the length of a dyadic interval in [0, 1) is of the form 2~ k , where k 
is a nonnegative integer. 

The dyadic intervals in R can be defined in the same way, with arbitrary 
integers j and k. The half-open, half-closed condition leads to nice properties 
in terms of disjointness, as in the next two lemmas, whose simple proofs are left 
as exercises. 

Lemma 8.2 For each nonnegative integer k, [0, 1) is the union of the dyadic 
subintervals of length 2~ k , and these subintervals are pairwise disjoint. 

Lemma 8.3 If Ji and J 2 are two dyadic subintervals of [0, 1), then either Ji C 
J 2 , or J 2 C J 1; or Ji n J 2 = 0. 

More precisely, if J\, J 2 are dyadic subintervals of [0, 1) such that the length 
of J 2 is less than or equal to the length of J\ , then either J 2 C J t or J\ fl J 2 = 0- 

Lemma 8.4 // J is a dyadic subinterval of [0, 1) of length 2~ k , and if n is 
an integer greater than k, then J is the union of the dyadic subintervals of 
J of length 2~ n , and these subintervals are pairwise disjoint. Every dyadic 
subinterval of [0, 1) of length 2~ n is contained in a unique dyadic subinterval of 
[0, 1) of length 2~ k when n > k. 

This is easy to see. 
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Lemma 8.5 If J 7 is an arbitrary collection of dyadic subintervals of [0, 1), then 
there is a subcollection Tq of T such that 

(8.6) U J -U J 

and the elements of J-q are pairwise disjoint. 

To prove this, we take Tq to be the set of maximal elements of J 7 , i.e., the 
set of J G J- such that J C J' for some J' e J only when J' = J. Every interval 
in T is contained in a maximal interval in J 7 , since every dyadic subinterval of 
[0, 1) is contained in only finitely many dyadic subintervals of [0, 1). Thus every 
element of T is contained in an element of J-o, which implies (8.6). Any two 
maximal elements of T which are distinct are disjoint, by Lemma 8.3, which 
implies the second property of in the lemma. 

Let / be a real or complex- valued function on the unit interval [0,1) which 
is sufficiently well-behaved for integrals of / over subintervals of [0, 1) to be 
defined. One is welcome to restrict one's attention to step functions here, and 
we shall simplify this a bit further in a moment. For each nonnegative integer 
k, let E k (f) be the function on [0, 1) defined by 

(8.7) E k {f)(x)=2- k Jf(y)dy, 

where J is the dyadic subinterval of [0, 1) with length 2~ fc that contains x. Of 
course, E k (f) is linear in /. 

Lemma 8.8 (a) For each f, Ek(f) is constant on the dyadic subintervals of 
[0,1) of length 2~ k . 

(b) If f is constant on the dyadic subintervals of [0, 1) of length 2~ k , then 
E k (f) = f- 

(c) For any f, E^E^f)) = E k (f) and E k (E 3 (f)) = E k (f) when j > k. 

(d) If g is a function on [0, 1) which is constant on the dyadic subintervals 
of [0, 1) of length 2- k , then E k (g f) = g E k (f) for each f. 

This is easy to verify, directly from the definitions. Note that the first part 
of (c) holds simply because E k (f) is constant on dyadic subintervals of [0,1) 
of length when j > k. In the second part of (c), one is first averaging / 
on the smaller dyadic intervals of length to get Ej(f), and then averaging 
the result on the larger dyadic intervals of length 2~ k to get E k (Ej(f)), and 
the conclusion is that this is the same as averaging over the dyadic intervals of 
length 2~ k directly. 

Definition 8.9 A function f on [0, 1) is a dyadic step function if it is a finite 
linear combination of indicator functions of dyadic subintervals of [0, 1). 

Lemma 8.10 Let f be a function on [0, 1). The following are equivalent: 
(a) f is a dyadic step function; 
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(b) There is a nonnegative integer k such that f is constant on every dyadic 
subinterval of [0, 1) of length 2~ k ; 

(c) Ek(f) = f for some nonnegative integer k, and hence for all sufficiently 
large integers k. 

One can check this using the previous lemma. From now on, one is welcome 
to restrict one's attention to dyadic step functions in this chapter. 

Lemma 8.11 For any functions f, g on [0, 1) and nonnegative integer j , 

(8.12) f E J {f){x)g{x)dx = f f(x)E j (g)(x)dx 

J[0.1) J[o.i) 

J[0,1) 

Lemma 8.13 For any functions f, g on [0, 1) and positive integers j, k with 

(8.14) f E (f)(x)(E j (g)(x)-E j _ 1 (g)(x))dx = 

J[0,1) 

and 

(8.15) f (/)(*) - E^inix)) (E k (g){x) - E k ^(g)(x))dx = 0. 

J[0,1) 

The computations for these two lemmas are straightforward and left to the 
reader. 

Let I be a dyadic subinterval of [0, 1), and let // and I r be the two dyadic 
subintervals of I of half the size of /. The Haar function hi(x) on [0, 1) associated 
to the interval I is defined by 

(8.16) h^x) = -\I\ 1/2 when x eh 

= \I\ 1/2 when x e I r 

= when x G [0, 1)\7. 

Observe that 

(8.17) / hi(x)dx = Q 

J[0.1) 

and 

(8.18) / hi(xfdx = 1. 

In addition, there is a special Haar function h (x) on [0, 1) defined by ho(x) = 1 
for every x € [0, 1), for which we also have 

(8.19) / h (x) 2 dx = 1. 

i[0,l) 
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If I and J are distinct dyadic subintervals of [0, 1), then ft/ and hj satisfy 
the orthogonality property 

(8.20) / hi(x) hj{x) dx = 0. 

■W) 

For if / and J are disjoint, then ft/(x) hj(x) — for every x E [0, 1), and the 
integral vanishes trivially. Otherwise, one of the intervals / and J is contained 
in the other, and we may as well assume that J C I, since the two cases are 
completely symmetric. Because J ^ I, J C 7 ; or J C 7 r , ft,/ is constant on J, 
and (8.20) follows from (8.17). If I is any dyadic subinterval of [0, 1), then 

(8.21) / h (x)hi(x)dx = 0, 

•J [0,1) 

by (8.17). 

For each function / on [0, 1) and nonnegative integer k, 

(8.22) E k (f) = (f,ho)h + Yl (fMhi- 

|/|>2- fc + 1 

Here the sum is taken over all dyadic subintervals I of [0, 1) with |/| > 2~ fc+1 , 
and is interpreted as being when k = 0. Also, (/, ho), (/, hi) are the integrals 
of / times ho, hi, respectively. In particular, dyadic step functions are finite 
linear combinations of Haar functions. If / is a dyadic step function on [0, 1), 
then 

(8.23) f = (f,h )h + J2(f,hi)hi, 

i 

where the sum is taken over all dyadic subintervals / of [0, 1). The sum is actu- 
ally a finite sum, since (/, ft./) = for all but finitely many /. This expression for 
/ follows from the orthonormality conditions for the Haar functions described 
earlier. 



8.2 Maximal functions 

As mentioned in the previous section, one is welcome to restrict one's attention 
to real or complex-valued functions on [0, 1) that are dyadic step functions in 
this chapter. The dyadic maximal function M(f) associated to a function / on 
[0, 1) is defined by 

(8.24) M(/)(x)=sup|£; fc (/)(a;)|. 

fc>0 

Equivalently, M(f)(x) is equal to 

: J is a dyadic subinterval of [0, 1) and x <E J >. 



.25) sup 



I J\ 



f(y)dy 
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For each nonnegative integer I, put 

(8.26) Mi(f)(x) = max \E k (f)(x)\, 

0<k<l 

which is the same as 

(8.27) M,(/)(a:)=max| ±-£f(y)dy : JC[0,1), x G J, and \J\ > 2"^, 

where the maximum is again taken over dyadic subintervals J of [0, 1). Thus 

(8.28) Mi(f) < M r (f) when r > I 
and 

(8.29) M(f)(x) = sup Mi(f)(x) for every x G [0, 1). 

i>a 

For any pair of functions /i, /2 on [0, 1), 

(8.30) M(/i + / 2 ) < M(/i) + M(/ 2 ) 
and 

(8.31) M,(/i + / 2 ) < M,(/i) + M,(/ 2 ) 
for each / > 0. Also, 

(8.32) M(cf) = \c\M(f) 
and 

(8.33) Mi(cf) = \c\Mi(f) 

for any function / and constant c. Thus M(f), M;(/) are sublinear in /. 

Lemma 8.34 /// is constant on the dyadic subintervals of[0, 1) of length 2~ l , 
then M(f ) is constant on the dyadic subintervals of [0,1) of length 2~ l , and 
M(f) = Mi(f). For any function f, 

(8.35) M l {f) = M(Ei(f)), 

and Mi(f) is constant on dyadic intervals of length 2~ l . 

Exercise. 

Corollary 8.36 If f is a dyadic step function, then M(f) is too, and M(f) = 
Mj(f) for sufficiently large j. 

Lemma 8.37 (Supremum bound for M(f)) If \f(x)\ < A for some A > 
and every x e [0, 1), then M(f)(x) < A for every x G [0, 1). 

This is an easy consequence of the definitions. Lemma 8.37 also works if 
\f(x)\ < A for every x G [0,1) except for a small set that does not affect the 
integrals. 
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Proposition 8.38 (Weak-type estimate for M(f)) For every A > 0, 

(8.39) \{x 6 [0, 1) : M(f)(x) > A}| < \ f \f(w)\ dw. 

A J[o,i) 

The left-hand side of (8.39) refers to the measure of the set in question, the 
meaning of which is clarified by the proof. 

Let A > be given, let T be the collection of dyadic intervals subintervals 
L of [0, 1) such that 

(8-40) A-Jj(y) d y > A , 

and let us check that 

(8.41) {x G [0,1) : M(f)(x) > A} = (J L. 

her 

If x G [0, 1) and M(f)(x) > A, then there is a dyadic interval L in [0, 1) such 
that x G L and L satisfies (8.40), because of (8.25), and hence the left side of 
(8.41) is contained in the right side of (8.41). Conversely, if L G T, then 



(8.42) 



M(f)(x) > 



w\L m 



dy 



> A 



for every x G L, and therefore L is contained in the left side of (8.41). Thus the 
right side of (8.41) is contained in the left side, and (8.41) follows. 

As in Lemma 8.5, if To consists of the maximal elements of then 

(8.43) |J L = |J L, 

and the intervals in To are pairwise disjoint. Thus 

(8.44) |{xG[0,l):M(/)( a; )>A}|= ]T \L\. 

LeTa 

If / is a dyadic step function which is constant on the dyadic subintervals of 
[0, 1) of length 2~ l , then M(f) is constant on the dyadic intervals of length 2~', 
and the elements of Tq have length > 2~'. 

Each interval L G To satisfies (8.40), which gives 



(8.45) 
and hence 



\L\ < 



J L -f(y) d y <\J\f(v)\dy, 



(8.46) J2 \ L \ < E \ I \f(y)\ d v = \ L \m\dy, 

using the disjointness of the intervals L G Tq. Therefore 



(8.47) |{xg[0,1):M(/)(x)>A}|<- 
which implies (8.39). 



{ye[0S):M(f)(y)>\} 



\f(y)\dy, 
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Lemma 8.48 For each A > 7 

(8.49) \{x E [0, 1) : M(f)(x) > 2 A}| < { f \f(u)\du. 

J{ue[o,iy.\f(u)\>\} 

Let A > be given, and put 

(8.50) h(x) = f(x) when |/(x)| < A, h(x) = when \f(x)\ > A, 
and 

(8.51) f 2 {x) = f{x) when \f(x)\ > A, f 2 (x) = when |/(x)| < A. 

Thus f(x) = fi(x) + f 2 (x) and M(f 1 )(x) < X for every x G [0, 1). This implies 
that 

(8.52) M(f)(x)<X + M(f 2 )(x) 
for every x, and hence 

(8.53) \{x E [0,1) : M(f)(x) > 2A}| < \{x £ [0,1) : M(f 2 )(x) > X}\. 
We can apply Proposition 8.38 with / replaced by f 2 to get that 

(8.54) \{x e [0, 1) : M(f 2 )(x) > X}\ < \ f \f 2 (u)\ du, 
and the lemma follows. 

Lemma 8.55 If g(x) is a nonnegative real-valued function on [0, 1) ; and p is a 
positive real number, then 

(8.56) f g(x) p dx= [ pA p_1 |{a; e [0, 1) : g(x) > X}\ dX. 
J[o,i) Jo 

One can see this by integrating pA p_1 on the set 

(8.57) {{x, X) G R 2 : x e [0, 1), < A < g(x)} 
first in A, and then in x, and first in x, and then in A. 
Proposition 8.58 For each real number p > 1, 

(8.59) / M(f)(x) p dx < f \f(y)\ p dy. 

J[0,1) P-*-J[o,i) 

To prove this, we apply Lemma 8.55 with g = M(f ) to get that 



(8.60) / M{f){x) p dx= pX p \{x G [0,1) : M(f)(x) > X}\dX. 
J[o,i) Jo 
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By (8.49) with A replaced by A/2, 

(8.61)/ M(f)(x)Pdx < n p \v-U\[ \f{u)\du)d\ 

J [0,1) JO VA ■/{«e[0,l):|/(u)|>A/2} ' 



2 P \ p - 2 \f(u)\dud\. 

J {ue[0,l):\f(u)\>\/2} 

Interchanging the order of integration leads to 

r r /-2|/(«)| 

(8.62) / M(f){x) p dx< / / 2p\ p - 2 \f{u)\d\du. 
J[o,i) J[o,i) Jo 

Because p > 1, 

(8.63) / M{f){xfdx < f 2p(p-l)-\2\f(u)\r- 1 \f(u)\du 

J [0,1) J [0,1) 

= Pi [ l/(")l pdu ' 

P - 1 J [0,1) 

as desired. 



8.3 Square functions 

The dyadic square function S(f) associated to a functon / on [0, 1) is defined 

by 

(8.64) S(f)(x) = (\E (f)(x)f + \Ej(f)(x) ^-i(/)(z)| 2 ) • 

3=1 



(8.65) $,(/)(*) = (\E (f)(x)\ 2 + \Ej(f)(x) Ej^f^x)^ , 



For each nonnegative integer I, put 

3=1 

where the sum on the right side is interpreted as being when 1 = 0. Thus 

(8.66) Si(f)(x) < S p (f)(x) wheal <p 

and S(f)(x) = sup />0 Si(f)(x). It is easy to see that S(f), Si(f) arc sublinear 
in /, in the sense that 

(8.67) S(f 1 + f 2 )<S(f 1 ) + S(h) 

and S(cf) = \c\ S(f) for all functions /i, f 2 , and / on [0, 1) and all constants 
c, and similarly for Si(f). 
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Lemma 8.68 If f is constant on the dyadic subintervals of[0, 1) of length 2 , 
then S(f) is constant on the dyadic subintervals of [0,1) of length 2~ l , and 
S{f) = S l {f). For any f, 

(8.69) S l {f) = S{E l {f)\ 

and Si(f) is constant on dyadic intervals of length 2~ l . 

Exercise. 

Corollary 8.70 If f is a dyadic step function on [0, 1), then S(f) is too, and 
S (f) = Sj(f) for sufficiently large j. 

Lemma 8.71 For any function f on [0,1), 

(8.72) / S l (f)(x) 2 dx = f \E t (f)(x)\ 2 dx 

J [0,1) J [0,1) 

for every I > 0, and 

(8.73) / S(f)(x) 2 dx= f \f(x)\ 2 dx. 

J [0,1) J [0,1) 

Of course 

l 

(8.74) E t (f) = E (f) +^(Ej(f) ~ Ej-iif)), 

3=1 

and to prove the lemma one uses the orthogonality conditions in Lemma 8.13. 

8.4 Estimates, 1 

Proposition 8.75 IfO < p < 2, then there is a positive real number C\{p) such 
that 

(8.76) / S(f)(x)P dx < C^p) f M(f)(x)Pdx 

J [0,1) J [0,1) 

for any function f on [0, 1). 

Let p < 2, a function / on [0, 1), and A > be given, and consider 

(8.77) \{xe[0,l):S(f)(x)>\}\. 

Let T denote the set of dyadic subintervals J of [0, 1) such that 

1 



" 8 > ,,, 



f(y) dy 



j 



> A. 



If is the set of maximal intervals in T , then 
(8.79) |J J = |J J 
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and 

(8.80) Ji n J 2 = when Ji, J 2 e Jb, ^1 ^ ^2, 
as in Lemma 8.5. 

Suppose that [0, 1) is not an element of Jo, an d let T\ be the set of dyadic 
subintervals L of [0, 1) for which there is a J e Jo such that 

(8.81) JCL and |J| = \L\/2. 

Because Jo consists of maximal intervals in J", each L in Ji docs not lie in J, 
and hence 

(8-82) A- Jj(y) d y < A 

for every L & T\. 

The elements of T\ need not be disjoint, and so we let J" 10 be the set of 
maximal elements of T\ . As usual, 

(8.83) |J L= |J L 
and 

(8.84) Li n L 2 = when L lt L 2 G Jio, £i / ^2- 
Let /a(x) be the function on [0, 1) defined by 

(8.85) f x (x) = jf /(?/) when iel, Le Ji 0) 

= /(z) whenxG [0,1)\( (J /). 

Lemma 8.86 If K is a dyadic subinterval of [0, 1) such that 
(8.87) K\[ |J /) ^0 

or L C K for some L G Jio, ^en 



hL f{u)du =w\L h{u)du - 



y ' ' \K 

Under these conditions, K is the disjoint union of the intervals L G Jio such 
that L C K and ■K'\(Uje.F 10 -0" The integral of / over K is equal to the sum 
of the integrals of / over these sets, which is the same as the integral of f\ over 
K. 

Corollary 8.89 If x G [0, l)\(U ie ^ 10 L), then S(f)(x) = S(f x )(x). 
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For these x's, Lemma 8.86 implies that Ej(f)(x) — Ej(f\)(x) for every 
nonnegative integer j, and hence S(f)(x) = S(f\)(x). 
Using the corollary, it is easy to sec that 



(8.90) 



\{xe [0,l):S(/)(x) > A}| 

< Yl \L\ + \{xe[0,l):S(h)(x)>\}\. 



For each L e .Flo, there is a J £ Jo such that J C L and \ J\ = \L\/2, and this 
leads to 

(8.9i) Yl w< 2 E i J i- 

By (8.79) and (8.80), 
(8.92) 



^|L|<2 U^= 2 U J 



As in (8.41), 
(8.93) 

Therefore 
(8.94) 

and hence 



|J J = {x £ [0, 1) : M(f)(x) > A}. 
]T \L\<2\{xe[0,l):M(f)(x)>\}\, 



(8.95) |{ze[0,l):S(/)(aO>A}| 

< 2 \{x e [0, 1) : M(/)(a:) > A}| + e [0, 1) : 5(/ A )(x) > A}|. 

By Lemma 8.71, 

(8.96) A 2 \{x e [0, 1) : S(f x )(x) > A}| < / S(h)(xf dx = [ \f x (x)\ 2 dx. 

J [0,1) J [0,1) 

Thus 

(8.97) \{x£[0,l):S(f)(x)>\}\ 

<2\{xe [0, 1) : M(f)(x) > A}| + A~ 2 / |/ A (x)| 2 dx. 

J [0,1) 

Lemma 8.98 |/ A | < min(A, M (/)). 

Indeed, for each dyadic subinterval / of [0, 1), we have that 



1.99) 



f{y)dy 



< M{f){x) 
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automatically when x G /, and 
(8.100) 



f(y) d y 



< X 



when / G T\ or x G A(Ui,e.Fio since / ^ J 7 in these two cases. With the 
help of Lemma 8.86, one can actually get the stronger estimate 

(8.101) M(/ A )<min(A,M(/)). 
Because of the lemma, we may replace (8.97) with 

(8.102) \{xe [0,l):S(f)(x)> X}\ 



<2|{ze[0,l):M(/)(z)>A}|+A- 



■I 

J[0,1) 



min(A,M(/)(x)) 2 dx. 



At the beginning of this argument, just before (8.81), we assumed that [0, 1) is 
not an element of Tq. If [0, 1) is an element of Jb C J 7 , then 



(8.103) 



[0,1) 



f(y) dy 



> A, 



M(f)(x) > A for every x G [0, 1), and hence 

(8.104) \{x G [0, 1) : S(f)(x) > A}| < \{x G [0, 1) : M(f)(x) > X}\. 
Thus (8.102) holds in general. By Lemma 8.55, 

(8.105) f S(f)(x) p dx= [ pA p_1 |{a; G [0, 1) : S(f)(x) > A}| dX 
J[o,i) Ja 

and 

(8.106) f M(f)(x) p dx = f P X p - x \{x G [0,1) : M(f)(x) > X} \ dX. 
J[o,i) Jo 

Therefore 

(8.107) / S{f){x) p dx 
J\o,D 



'[o, 
< 2 



f M(f)(x) p dx+ [ P X p - 3 f min(X, M (f)(x)) 2 dxdX. 
J [0,1) Jo J[o,i) 



We can interchange the order of integration and replace the second term on the 
right side of (8.107) with 



1108) 



/ f 

J [o,i) Jo 



p X p ~ 6 min(A, M(f){x)Y dX dx. 
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The integral in A can be computed exactly, since 

(8.109) I" p\P-3 M (f)(x) 2 d\=-^M(f)(xr 

JM(f)(x) Z ~P 

and 

rM(f)(x) 

(8.110) / P X p - 3 X 2 dX = M{f){xf. 
Jo 

Proposition 8.75 now follows by using these formulae in (8.107). 

The coefficient (2 — p) _1 in the previous computations is not very nice, and 
one can get bounded constants for p near 2 using interpolation arguments. One 
can also start with an estimate for p = 4 instead of p = 2, as in Section 8.10, 
and use the same method as here to get estimates for < p < 4 that remain 
bounded for p near 2. 

Proposition 8.111 (Weak-type estimate for S(f)) For any function f on 
[0, 1) and A > 0, 



(8.112) \{x 6 [0, 1) : S(f) > X}\ < f / \f(x)\ dx. 

A J [0,1) 



This follows from practically the same arguments as above. By (8.97) and 
Lemma 8.98, 

(8.113) \{x G [0,1) : S(f)(x) > X}\ 

<2\{xG [0, 1) : M(f)(x) > X}\ + A" 1 f \f x (x)\dx, 

J[0,1) 

where f\(x) is as in (8.85). To get (8.112), one can use Proposition 8.38 and 
the observation that 

(8.114) f \fx(x)\dx< f \f(x)\dx. 
J [OA) J [OA) 

8.5 Estimates, 2 

Proposition 8.115 If < p < 2, then there is a positive real number C2(p) 
such that 

(8.116) / M(f)(x) p dx<C 2 (p) [ S(f)(x) p dx 

J [0,1) J [0,1) 

for any function f on [0, 1). 

Let p < 2 and / be given, and let A > be a positive real number. 
Lemma 8.117 The set 

(8.118) {x e [0, 1) : S(f)(x) > X} 

is a union of dyadic subintervals of [0, 1). 
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If w G [0, 1) and 

(8.119) S(f)(w) = (|£o(/)H| 2 + Y, \Ej(f)H ^-i(/)H| 2 ) > A, 

i=i 

then 

(8.120) (\Eo(f)(w)\ 2 + J2 \W)(«>) £;-i(/)H| 2 ) V2 > A 

for some I. Let 7 be the dyadic subinterval of [0, 1) such that |7| = 2~' and 
w G 7. Because Ej(f) is constant on dyadic intervals of length 2~ J , Ej(f)(y) = 
Ej(f)(w) when j < / and y G 7, and hence 

(8.121) (|75 (/)(y)| 2 + £ |7^(/)(y) - (/)(y)| 2 ) 1/2 > A 

j=i 

for every y G 7. Therefore S(f)(y) > A for every y E I, and the lemma follows 
easily. 

Let £/o be the collection of maximal dyadic subintervals of [0,1) contained 
in the set (8.118). As usual, 

(8.122) |J J = {xe [0,1) : S(f)(x) > X}, 
Jeg 

and the intervals in Q are pairwise disjoint. In particular, 

(8.123) Y \J\ = \{x€[0,l):S(f)(x)>X}\. 

Suppose that (8.118) is not equal to the whole unit interval [0, 1). Let Gi be 
the collection of dyadic subintervals L of [0,1) for which there is a J G Go such 
that 

(8.124) J C 7 and \ J\ = \L\/2. 

Because the elements of Go are maximal dyadic intervals contained in (8.118), 
each 7 G Gi is not a subset of (8.118). Thus for each 7 G Gi there is a point 
u G 7 such that S(f)(u) < A. If £{L) denotes the nonnegative integer such that 
2- e ^ = \L\, then 

(8.125) (|£ (/)W| 2 + J2 \ E Af)(u) ^-i(/)H| 2 ) < A, 

3=1 

where the sum on the left is interpreted as being if £(L) = 0. More precisely, 
this inequality holds for at least one u G 7, and hence at every u G 7, because 
Ej(f ) is constant on L when j < £(L). 
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The intervals in Gi need not be pairwise disjoint, and we can pass to the 
sub-collection C?io of maximal elements of Q\ to get 

(8.126) |J L = |J L 

LeGio LE&i 

and Li n L 2 = when Li,L 2 £ Q\ and L\ ^ L 2 . The dehnition of Q\ implies 
that U Je g J *!= Uied ^' anc ^ t nere f° re 

(8.127) {x £ [0, 1) : S(f)(x) > A} C |J L. 

LeGio 

Also, 

(8.128) £ 1^1 < E 2 I J l = 2 \i* e [°- !) : > A )l- 
Let 5a (a;) be the function defined on [0, 1) by 

(8.129) g x (x) = ^ ^ f(y) dy when x £ L, L £ Q w 

= f(x) whenxe [0,1)\( |J /). 

JeSio 

Lemma 8.130 If if is a dyadic subinterval of [0, 1) such that 

(8.131) if\( |J /) ^0 

or L C if /or some i £ C/io, i/ien 

(8.132) W\l K 9x{U)dU= W\L f{U)dU - 

This uses the fact that if is the disjoint union of the L £ C?io with L C if 
and if\(U/ e e 10 

as in Lemma 8.86. 

Corollary 8.133 i/z £ [0, l)\(U/ eSl0 ^en M(f)(x) = M( ffA )(a:). 

Corollary 8.134 If x £ [0, 1)\( U Je g 10 J), then S(f)(x) = S(g x )(x). If L £ 
Gio, v £ L, and 2~^ L ' = \L\, then 

/ m X 1/2 

(8.135) S{g x ){v) = (\E (f)(v)\ 2 + £ |^(/)(«) - ^(/X^ 2 ) . 

These two corollaries follow from Lemma 8.130 and the relevant definitions. 
Corollary 8.136 S(g x ) < min(A,S(/)). 
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Corollary 8.134 implies that S(g x ) < S(f), and we get S(g\) < A using also 
(8.125) and (8.127). 

Because of Corollary 8.133, 

(8.137) {x G [0, 1) : M(f)(x) > X} 

c( |J /)u{xe[0,l):M( 3A )(x)>A}, 
ieGoi 

and hence 

(8.138) \{xe [0,1): M(f)(x) > X}\ 
<( E \I\)+\{xe[0,l):M(g x )(x)>X}\. 



IeGoi 



Therefore 



(8.139) \{x G [0,1) : M(f)(x) > X}\ 

< 2 |{x G [0, 1) : S(f)(x) > A}| + \{x e [0, 1) : M(g x )(x) > X}\, 

by (8.128). Of course, 

(8.140) \{x e [0, 1) : M( 3A )(x) > A}| < A" 2 f M(g x )(u) 2 du, 

J[0,\) 

and 

(8.141) / M{g x )(ufdu<C f \g x (y)\ 2 dy 



'[o,i) J[o,i) 
for some C > 0, by Proposition 8.58. Moreover, 

(8.142) f \g x (y)\ 2 dy = [ S(g x )(w) 2 dw 

J [0,1) J [0,1) 

< [ mm(X,S(f)(w)) 2 dw, 

J[0,1) 

by Lemma 8.71 and Corollary 8.136. It follows that 

(8.143) \{x G [0, 1) : M(g x )(x) > X}\ < C A" 2 f min(A, S(f)(w)) 2 dw, 

J[0,1) 

and consequently 

(8.144) |{a;G[0,l):M(/)(x)>A}| 

<2|{xG [0,1) : 5(f)(x) > A}|+CA- 2 / min(A, S( f)(w)) 2 dw. 

J [0,1) 

We assumed near the beginning of the argument that the set (8.118) is not 
all of [0, 1). If it is, then the preceding inequality holds trivially. The rest of the 
proof of Proposition 8.115 proceeds via computations like those in the previous 
section. 
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8.6 Duality, 1 

If fi, fi ar e functions on [0, 1), and I is a nonnegative integer, then 
(8.145) f E l (f 1 )E l (f 2 )dx = 

•'[0,1) 

- ! 

/ (^ (/i) S (/ 2 ) + ^(^(/i) - ^-i(/i)) (^(/ 2 ) - ^-i(/ 2 ))) da;, 

where the sum is interpreted as being when I = 0. This is a "bilinear" version 
of (8.72) in Lemma 8.71, which can be verified in essentially the same way. By 
the Cauchy-Schwarz inequality for sums, 



J[0,1) 



< I Stif^S^dx, 
[04) 



(8.146) 

and for suitable functions /i and / 2 , 
(8.147) 

Proposition 8.148 For each q > 2, there is a Cs(q) > such that 
(8.149) 



/ h{x)h{x)dx 



[0,1) 



\f(x)\"dx<C 3 (q) / S{f){xfdx. 
[0,1) J[o,i) 

Let q > 2 be given, and let p, 1 < p < oo, be the exponent dual to q, so that 
l/p+ 1/q = 1 and p < 2. By Holder's inequality, 



(8.150) 



[0,1) 



fi(x)f 2 (x)dx 



< 



([ SihYdy) 1 '" ( [ S(f 2 fdw) 
W[o,i) ' K J[os) ' 



i/p 



Propositions 8.58 and 8.75 yield 



(8.151) 



[o,r 
< c 



h(x)f 2 (x)dx 



[0,1) 



S{h){y) q dy 



t/<i 



[0,1) 



\.f 2 (w)\ p dw) 



i/p 



for some C > 0. In general, if 
(8.152) 



/ h{x)f 2 {x)dx 


<A([ \f 2 (w)\Pdw) 


J[0,1) 


v J[o,i) ' 



l/p 



for some A > and arbitrary functions f 2 on [0, 1), then 
(8.153) ([ \h(x)\idx) 1/q < A, 

and the proposition follows. 
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8.7 Duality, 2 

Proposition 8.154 For each q > 2, there is a Ci{q) > such that 

(8.155) / S(f)(xf dx < C 4 (q) f \f{x)\«dx. 

J [OA) J [OA) 

Let q > 2 be given, and let p be the conjugate exponent to q. It suffices to 
show that the proposition holds with S(f) replaced with Si(f) for every I, with 
a constant that does not depend on I. To do this, it is enough to show that 



(8.156) 


/ (a (x)E (f)(x)+J2a j (x) (Ej(f)(x) - ^-i(.f)(x))) 

'[0,1) V J = l ' 


dx 




is less than or equal to a constant times the product of 






(8.157) 


([ \f{y)Vdy) 1/q 






and 
(8.158) 


(f (£m»>i 2 ) ,/2 *»)'" 

V -Ad) Vo 






for arbitrary functions ao, on [0, 1). By Lemma 8.11, (8.156) is equal to 


(8.159) 


r 1 

/ (E Q (a )(x) +^ (Ej(<*j)(*) - f(*)d* 

J[0A) v ]=i ' 






Holder's inequality implies that this is less than or equal to the product of 
(8.157) and 


(8.160) 


( / E (a )(x) +J2(E j (a j )(x) - E^a^x)) " dx) ^ 







Thus we would like to show that (8.160) is less than or equal to a constant times 
(8.158). Proposition 8.115 implies that (8.160) is bounded by a constant times 

(8.161) (/ s(E,{a ) + j2iEM 3 )-E 3 -i{a ))){xYdx) 1,P , 



and so we would like to show that (8.161) is bounded by a constant times (8.158). 
One can check that 

l 

(8.162) s(E (a ) + £ (E^) - E^aj))) (x) 

3 = 1 
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is equal to 

1 /2 

(8.163) (iSoMWl'+El^^)^)-^-!^)^)! 2 ) • 
It therefore remains to show that 

(8.164) ( f (iSoM^r+^l^^)^)-^!^)^)! 2 )" 72 ^) 17 " 

is bounded by a constant times (8.158). This can be done using the results 
discussed in the next section. 



8.8 Auxiliary estimates 

Let I be a nonnegative integer, and let j3o(x), B\(x), . . . , Bi(x) be nonnegative 
functions on [0,1). Given p, r > 1, consider the problem of bounding 



(8.165) ( / (]Ti^)(xr) PA dx) 1/! ' 



by a constant times 



(8.166) (/ (j2^) r Y /r dx) 1/P , 



[0,1) 



1=0 



where the constant does not depend on I or B (x), /3i(x), ... ,Bi{x). If r = oo, 
then 

(8.167) (^ft(x) r ) 1/r , (^^-(/3,)( a ;r) 1/r 

i=o i=o 

should be replaced with 

(8.168) max BAx), max EABAix), 

0<j<l 0<j<l 

as usual. 

Lemma 8.169 For each p> 1, nonnegative integer j , and nonnegative function 
8 on [0, 1), 

(8.170) / E 3 {B){x) p dx < [ (3(x) p dx. 

J[0,1) J[0,1) 

If J is any interval in [0, 1), then 

(8.171) (^jwdvYK^jwdy, 
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by Jensen's inequality. Lemma 8.169 follows by summing this over the dyadic 
intervals J of length 2~- 7 . 

Using Lemma 8.169, it is easy to see that (8.165) is less than or equal to 
(8.166) when p = r. When r = oo, we might as well restrict our attention to 
the case where the /3j 's are all the same, and the question reduces to one about 
maximal functions. Lemma 8.37 and Proposition 8.58 yield suitable estimates 
for 1 < p < oo. 

Lemma 8.172 Suppose that 1 < r < p < oo, and let s G (1, oo) be conjugate to 
p/r, so that 1/s + r/p = 1. For each positive real number A , (8.165) is less than 
or equal to A times (8.166) for arbitrary nonnegative functions /?o,/3i, •■•,/?/ 
on [0, 1) if and only if 

i 



.173) / (j2E j (f3 j )(xy)h(x)dx 

<A r ([ (J2Pj(y) r Y /r dy) r/P ( I h( w ydw) 

Mo,i) v ^ ' ' Mo,i) ' 



3 = 

for arbitrary nonnegative functions /?o, 0i, . ■ . , fii and h on [0, 1). 

This is basically the same observation as in (8.152) and (8.153), applied to 
this situation. Let us continue to assume that 1 < r < p < oo, and that s is 
conjugate to p/r. We would like to show that (8.173) holds for a suitable choice 
of A . Because r > 1, Ej((3j) r < Ej(f3j), by Jensen's inequality. Hence 

(8.174) / (Y,E j {i3 j ){xy)h{x)dx< [ (Y / E J W r J )(x))h(x)dx. 
This implies that 

(8.175) / (Y,E j {Pj){x) r )h{x)dx< [ j^fj 3 (xYE 3 {h){x)dx, 
and thus 

(8.176) / (Y J E jWj)(x) r )h{x)dx < [ V (3 3 {x) r M{h){x) dx. 
By Holder's inequality, 

r 1 

(8.177) / (Y J E j {[3 j ){xy)h{x)dx 

^ ( / (jlPM r ) P ' r dy) r,P { I M{h){ w ydw) 1/S . 
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It follows from Proposition 8.58 that (8.173) holds for some Ao > 0. This shows 
that (8.165) is bounded by a constant times (8.166) when 1 < r < p < oo. 

Lemma 8.178 If I < p < oo, 1 < r < oo, and p' , r' are the exponents 
conjugate to p, r, respectively, then for each positive real number Aq, (8.165) 
is less than or equal to A times (8.166) for arbitrary nonnegative functions 
Po, 0i, ■ ■ ■ , Pi on [0, 1) if and only if the same is true with p, r replaced by p' , r' . 

This also works for p, r = l,oo, with minor adjustments of the usual type. 
To prove the lemma, the main step is to observe that (8.165) is bounded by 
A n times (8.166) for all nonnegative functions Po(x), Pi(x), . . . , Pi{x) on [0, 1) if 
and only if 

r 1 

(8.179)/ (YsEjiP^^ixfjdx 

<ao([ (±p j{X )f r dx y /p {[ {±rt*fY"**) w 

W [0 ,i) ^ / / v -Vi)Vo ' ' 

for all nonnegative functions Po(x), Pi(x), . . . , Pi(x) and 70 (ar), 71 (a;), ... , -)i{x) 
on [0, 1). It follows from the lemma and the remarks preceding it that (8.165) 
is less than or equal to a constant times (8.166) when 1 < p < r < 00. 



8.9 Interpolation 

Let T be a linear operator acting on real or complex- valued dyadic step functions 
on [0, 1). Suppose that 1 < p < q < 00, 

(8.180) (f \T(f)(x)\P dxY /P <N p ( [ \f(xWdx) 1/P , 
W[o,i) ' v J[o,i) ' 

and 

(8.181) (/ \T(f)(x)\« dxY /q < N q ( f \f(xWdx) 1/q 
when q < 00 or 

(8.182) sup \T(f)(x)\ <N X sup\f(x)\ 
xe[o,i) [0,1) 

when q = 00, for some N p , N q > and each /. If < t < 1 and 

(8.183) l = L + ^—l 7 

r p q 

then 

l/r f f \ l/r 



(8.184) (/ \T(f)(x)\ r dx) KN^N^f \f(x)\ r dx) 
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This can be derived from Theorem 7.5, as follows. For each positive integer I, 

(8.185) (f \E l (T(f))(x)\?dx) 1/P <N P ( [ \f(x)\*dx) 1/P , 

and analogously for q instead of p. Theorem 7.5 can be applied to get that 

(8.186) (/ mnmx^dx) 1 '" ^NiN^i [ \m\ r dx) 1/r 

for all step functions / on [0, 1) that are constant on dyadic intervals of length 
2~', by thinking of Ei o T as a linear transformation on that space, which can 
be identified with R n or C", n = 2 l , as appropriate. Once one has (8.186) for 
step functions that are constant on dyadic intervals of length 2~ l for every I, it 
is easy to derive (8.184) for arbitrary dyadic step functions. Of course, one can 
extend this to other classes of functions too. 

The maximal and square function operators discussed in this chapter are not 
linear, but the same interpolation inequalities can be applied to them. One can 
show this by approximating these operators by linear operators. Suppose that 
T is a not-necessarily-linear operator acting on dyadic step functions on [0, 1) 
such that for each dyadic step function / there is a linear operator A on the 
same space of functions with the properties that 

(8.187) \A(h)(x)\<T(h)(x) 
for every h, x and 

(8.188) T(f)(x) = \A(f)(x)\. 

If T satisfies (8.180) and (8.181) or (8.182), then the analogous inequalities 
hold for these approximating linear operators A. By interpolation, the approx- 
imating linear operators A satisfy (8.184), and therefore T does too. One can 
approximate maximal functions in this way by linear operators of the form 

(8.189) E a{x) (f)(x), 

where a(x) takes values in nonnegative integers. One can approximate square 
functions by linear operators of the form 



(8.190) a (x) E (f)(x) +J2^(x)(E l (.fKx) - E^lfKx)), 

i=l 

where (EUKWI 2 ) 1/2 < 1- 

8.10 Another argument for p = 4 

Let / be a function on [0, 1), and consider 

oo 

(8.191) s(f)( X f = (lEoimxtf+^imw-Ej-iimxtfy 



3 = 1 
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Put 

00 1 /2 

(8.192) R j (f)(x) = (^\E k (f)(x)-E k . 1 (f)(x)\ 2 ) 

k=j 

for each positive integer j. Thus 

(8.193) S(f)(x) 4 = (\E (f)(x)\ 2 + R 1 (f)(x) 2 ) 2 

= \E (f)(x)\ 4 + 2 \E (f)(x)\ 2 i?i(/)(x) 2 + R 1 (f)(x) 4 , 

and 

00 

(8.194) ^(J)^) 4 = J2\ E i(f)( x )- E i-i(f)( x )\ 4 

3=1 

OO 

+2^|£;,(/)(x)-£;,_ 1 (/)(x)| 2 i?, +1 (/)( a; ) 2 . 

If 7 is a dyadic interval of length 2 _ - J , then 

(8.195) j(\E (f)(x)\ 2 +R 3+1 (f)(x) 2 )dx = jYf(x)\ 2 dx. 

This is analogous to Lemma 8.71, using orthogonality properties on I analogous 
to those in Lemma 8.13 on [0, 1). In particular, 

(8.196) J R 3+ i(f)(x) 2 dx< J \f(x)\ 2 dx< J M(\f\ 2 )(x) dx, 

where M(|/| 2 ) is the dyadic maximal function associated to \ f\ 2 . It follows that 

(8.197) / \E 3 (f)(x) - E 3 ^{f){x)\ 2 R 3+1 (f)(xf dx 

J [0,1) 

J[0,1) 

for each j > 1, by expressing the integral over [0, 1) as a sum of integrals over 
dyadic intervals of length 2 _3 , and using the fact that \Ej(f)(x) — Ej-i(f)(x)\ 2 
is constant on dyadic intervals of length 2~ J . 
Using estimates like these, one can check that 

(8.198) / S(f)(x) 4 dx < C f S(f)(x) 2 M(\f\ 2 )(x)dx 

J[0,1) J[0,1) 

for some constant C > that does not depend on /. This also uses the fact that 
M(f) 2 < M(|/| 2 ) to deal with the diagonal terms. This gives another way to 
estimate the L 4 norm of S(f) in terms of the L 4 norm of /. More precisely, one 
can first apply the Cauchy-Schwarz inequality to the right side of (8.198). This 
implies that the L 4 norm of S(f) is bounded by a constant times the product 
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of the square root of the L 4 norm of S(f) and the fourth root of the L 2 norm 
of M(|/| 2 ). Dividing both sides by the square root of the L 4 norm of S(f) and 
then squaring, one gets that the L 4 norm of S(f) is bounded by a constant 
times the square root of the L 2 norm of M(|/| 2 ). The latter is bounded by a 
constant multiple of the L 4 norm of /, as desired, because of the L 2 estimates 
for the maximal function applied to |/| 2 . 



8.11 Rademacher functions 

For each positive integer j, the j'th Rademacher function rj is the dyadic step 
function on [0,1) which is constant on dyadic intervals of length and whose 
values alternate between 1 and —1. Thus 

(8.199) rj (t) = 1 

when k 2~ J < t < (k + 1) 2~ J and k is an even integer, and 

(8.200) rj(t) = -1 

when k is odd. In particular, 

(8.201) \r 3 (t)\ = 1 

for each j and t. If I is a dyadic subinterval of [0, 1) of length |7| > 2~ J , then 

(8.202) Jrj{t)dt = 0, 

because the values of rj alternate between 1 and —1 on I. If j and I are distinct 
positive integers, then 

(8.203) / r j (t)r l {t)dt = Q. 

Jo 

Thus the Rademacher functions are orthogonal with respect to the usual integral 
inner product 

(8.204) (/i,/ 2 )= / h(t)f 2 (t)dt 

Jo 

for real-valued functions on the unit interval. Since 

(8.205) / rj (t) 2 dt=l 

Jo 

for each j, the Rademacher functions are orthonormal with respect to this inner 
product. 

If / is a real- valued dyadic step function on [0, 1) and p is a positive real 
number, then we put 

(8.206) \\f\\p=(f 1 \f{x)\"dx) 1/P . 
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This is a norm when p > 1, and a quasinorm when < p < 1. Jensen's 
inequality implies that 

(8-207) ' ' ||/|| p < 11/11, 

when p < q. If 

n 

(8.208) / = E a ^ 

is a linear combination of Rademacher functions, then 



3 = 1 

by orthonormality. Hence 

% 

3 = 1 

when p <2 < q. It turns out that for each q > 2 there is a £?(g) > such that 

1/2 



3-209) H/II 2 =(E« J 2 ) J 



(8-210) 11/11, < (E aj 2 ) 1/2 < 



(8.211) ll/L<-B(«?)(l>; 



2 



(8-212) (j> 2 ) <B(p)||/|| p . 



and for each p < 2 there is a _B(p) > such that 
These constants do not depend on n or the coefficients oi, . . . , a n . By contrast, 

n 

(8-213) o max |/(t)| = EKi 

and hence the analogous statement for q = +oo docs not work. 

If q is an even integer, then we can expand |/(£)| 9 as a g-fold sum of products 
of Rademacher functions. The integral of a product of Rademacher functions 
is 1 when the corresponding indices are equal in pairs, and is otherwise. This 
permits one to estimate ||/||* by a multiple of 



3.214) (£«») 



q/2 

-3) 

3 = 1 



as desired. Actually, it suffices to know that each index of a Rademacher func- 
tion in a product is equal to at least one other index when the integral of the 
product is different from 0. If q > 2 is not an even integer, then we can apply the 
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previous assertion to the smallest even integer Q > q and use the monotonicity 
of \\f\\q. One could use Holder's inequality instead, in the form 



(8-215) \\f\\ q <\\.f\\ a Q \\f\\ 1 2- a 
where 

,„ „.„s 1 a 1 — a 

(8 - 216) 

to get a better constant. For p < 2, we can use Holder's inequality in the form 

(8-217) ||/|| 2 < \\f\\ b p \\f\\l- b 

with 

and replace ||/||4 by a multiple of ||/||2 to estimate ||/||2 in terms of ||/|| p . 

One can also see this as a consequence of the analysis of the previous sections. 
If Ei is as defined in (8.7), then Ei(rj) — when j > I and Ei(rj) — Tj when 
j < I. Hence 

(8.219) E l (r j )-E l _ 1 (r j )=0 
when j ^ I, and 

(8-220) E j (r j )-E j _ 1 (r j )=r j . 
Therefore 

(8-221) E j (f)-E j _ 1 (f)=a j r j 



1/2 



for j = 1, . . . , n, and 

n 

(8-222) S(/) = (E«') 

In particular, the square function S(f) is constant on [0, 1). 

8.12 Walsh functions 

If A = . . . , j n } is a finite set of positive integers, then the Walsh function 
wa is the dyadic step function on the unit interval which is the product of the 
Rademacher functions with these indices, i.e., 

(8-223) w A {t) = r h {t)---r jn {t). 

This should be interpreted as the constant function equal to 1 on [0, 1) when 
A = 0. Thus 

(8.224) \w A {t)\ = l 

for each A and t, and 



3.225) / w A (t) 2 dt = l. 

Jo 



126 



CHAPTER 8. SOME DYADIC ANALYSIS 



The Walsh functions are orthonormal with respect to the usual integral inner 
product, because the integral of a product of Rademacher functions on [0, 1) 
is nonzero if and only if the indices of the Rademacher functions are equal 
in pairs, as in the previous section. One can check that the Walsh functions 
form an orthonormal basis for the space of all dyadic step functions on [0, 1). 
More precisely, the Walsh functions associated to subsets A of {1, . . . ,n} form 
an orthonormal basis for the dyadic step functions that are constant on dyadic 
intervals of length 2~ n . Remember that there are 2" subsets of {1, . . . , n}, which 
is the same as the number of dyadic subintervals of [0, 1) of length 2~". 

If A C {1, . . . , n}, then w A is constant on dyadic intervals of length 2~™, and 

(8.226) E l {w A )=w A 

when I > n. If also n E A, then E n -i(w A ) = 0, and therefore 

(8.227) E l (w A ) = 
for each I < n. It follows that 

(8.228) Ei{w A )-E l _ 1 {w A ) = Q 
when I ^ n, and 

(8.229) E n {w A ) - E n -i(w A ) = w A . 

The Walsh group can be defined as the set of sequences of ±l's, with re- 
spect to coordinatewise multiplication. Thus the Walsh group is the Cartesian 
product of a sequence of copies of the group with two elements, and is a com- 
pact Hausdorff topological space with respect to the product topology. The 
group structure is compactible with the topology, so that the Walsh group is a 
commutative topological group. Walsh functions correspond exactly to Fourier 
analysis on this group. 



Appendix A 

Metric spaces 



A metric space is a set M together with a nonnegative real-valued function 
d(x, y) defined for x,y G M such that d(x, y) = if and only if x = y, 

(A.l) d(x,y) = d(y,x) 

for every x, y € M, and 

(A.2) d(ar, «) < d{x, y) + d{y, z) 

for every x,y, z £ M. The function d(x, y) is known as the metric on M, and 
represents the distance between x and y in the metric space. If V is a real or 
complex vector space equipped with a norm then it is easy to see that 

(A.3) d{v,w) = \\v-w\\ 

is a metric on V. In particular, the standard Euclidean metric on R™ is the 
metric that corresponds to the standard Euclidean norm on R™ in this way If 
we identify C" with R 2 ™ in the usual way, then the metric on C™ determined 
by the standard Euclidean norm corresponds exactly to the standard Euclidean 
metric on R™. 

Let (M,d(x,y)) be a metric space. A sequence {xj}°^ 1 of elements of M is 
said to converge to x € M if for each e > there is an L > 1 such that 

(A. 4) d(xj,x) < e 

for each j > L. One can check that the limit a; of a convergent sequence {xj}°^ 1 
is unique when it exists, in which case we put 

(A. 5) lim Xj = x. 

A sequence of elements {xj}°^ 1 of M is said to be a Cauchy sequence if for each 
e > there is an L > 1 such that 

(A. 6) d(xj,xi) < e 
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for every j, I > L. One can also check that every convergent sequence in M is a 
Cauchy sequence. 

Conversely, if every Cauchy sequence in M converges to an clement of M, 
then M is said to be complete. As in Section 1.1, R and C are complete as 
metric spaces with respect to their standard metrics. This implies that R n 
and C™ are complete with respect to their standard metrics for each positive 
integer n, because a sequence of elements of R™ or C™ is a Cauchy sequence or a 
convergent sequence if and only if their corresponding n sequences of coordinates 
have the same property. 

Let E be a subset of a metric space M. A point p £ M is said to be in the 
closure E of E in M if for each e > there is a point q £ E such that 

(A.7) d{p,q)<e. 

If p £ E, then one can simply take q = p, so that every element of E is 
automatically an element of E. If 

(A.8) E = E, 

then we say that E is a closed set in M. 

One can check that the closure of any set in M is closed. If {xj}jZ 1 is a 
sequence of elements of a subset E of M that converges to an element x of M, 
then it is easy to see that x £ E. Conversely, every element of E is the limit of 
a sequence of elements of E that converges in M. 

If x is an element of a metric space M and r > 0, then the closed ball in M 
with center x and radius r is defined by 

(A.9) B(x, r) = {y £ M : d(x, y < r}. 

One can check that this is always a closed set in M, using the triangle inequality. 

A subset E of a metric space M is said to be dense in M if E = M. This 
is equivalent to saying that every element of M is the limit of a convergent 
sequence of elements of E. The set Q of rational numbers is dense in the real 
line with the standard metric, for instance. 

A subset E of a metric space M is said to be bounded if it is contained in a 
ball, which is to say that 
(A.10) ECB(p,r) 

for some p £ M and r > 0. In this case, we also have that 
(A.ll) ECB(q,r + d(p,q)) 

for every q £ M, by the triangle inequality. If E C M is bounded and nonempty, 
then the diameter is defined by 

(A. 12) dia,mE = sup{d(x, y) : x, y £ E}. 

One can check that the closure E of a bounded set E C M is bounded as well, 
and that the diameter of E is the same as the diameter of E. 
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If (M, d(x, y)) is a metric space and X is a subset of M, then the restriction 
of the metric d(x, y) on M to x, y € X satisfies the requirements of a metric on 
X, so that X becomes a metric space too. If M is complete as a metric space 
and X is a closed subset of M, then X is also complete as a metric space. To 
see this, observe that any Cauchy sequence {xj}JL l in X is a Cauchy sequence 
in M as well. If M is complete, then {xj}J^ 1 converges to an element x of M, 
and x € X when X is a closed set in M. 

Suppose now that (M,d(x,y)) and (N,p(u,v)) are both metric spaces. Let 
/ be a function on M with values in N, which is the same as a mapping from 
M into N, and which may be expressed symbolically by / : M — > N. As usual, 
/ is said to be continuous at a point x <E M if for every e > there is a 5 > 
such that 

(A.13) p(f(x)J(y))<e 

for every i/eM such that d(x 7 y) < S. If / is continuous at x, and if {xj}°^ 1 
is a sequence of elements of M that converges to x, then it is easy to see that 
{f(xj)}jl 1 converges to f(x) in N. Conversely, if / is not continuous at x, then 
one can check that there is an e > and a sequence {xj}°Z 1 of elements of M 
that converges to x such that 

(A.14) p(f(x)J( Xj ))>e 

for each j, so that {f(xj)}°^ 1 does not converge to f(x) in N. 

A mapping / : M — > N is is said to be continuous if it is continuous at every 
point in M. Suppose that (Mi,d\), (M2,d?), and {M^,d^) are metric spaces, 
and that fa : M\ — > Mi and fa : — > M3 are continuous mappings between 
them. The composition fa o fa is the mapping from Mi into M 3 defined by 

(A.15) (faofa)(x)=fa(fa(x)) 

for every x E Mi . One can check that fa o fa is also a continuous mapping from 
Mi into M 3 under these conditions, using either the definition of continuity in 
terms of e's and <5's, or the characterization of continuity in terms of convergent 
sequences. 

If / and g be continuous real or complex- valued functions on a metric space 
M, then their sum f + g and product f g are also continuous functions on 
M. More precisely, when we say that a real or complex- valued functions on M 
is continuous, we mean that it is continuous as a mapping into R or C with 
its standard metric. To show that f + g and / g are continuous, one can use 
the characterization of continuous functions in terms of convergent sequences 
to reduce to the analogous statements for sums and products of convergent 
sequences of real or complex numbers. Of course, one can also prove this more 
directly, using very similar arguments. 

Let M be a set, and let (N, p(u, v)) be a metric space. A sequence {fj}°Zi of 
mappings from M into N is said to converge pointwise to a mapping / : M — >• N 
if {fj{x)}j2. 1 converges as a sequence of elements of N to f(x) for every x G M. 
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Similarly, {fj}j°-i is said to converge to / uniformly on M if for each e > 
there is an L > 1 such that 

(A.16) p(f j (x),f(x))<e 

for every j > L and x G M. The difference between uniform and pointwise 
convergence is that L depends only on e and not on x in the definition of 
uniform convergence, while L is allowed to depend on both e and x in the 
analogous formulation of pointwise convergence. If (M, d(x,y)) is also a metric 
space, and {fj}JL\ is a sequence of continuous mappings from M into N that 
converges uniformly to a mapping / : M — >• N, then a well-known theorem 
states that / is also continuous. 

A function / on a set M with values in a metric space N is said to be 
bounded if 

(A.17) /(M) = {/(x) :xeM} 

is a bounded subset of N. Note that the sum and product of bounded real or 
complex- valued functions on M are also bounded functions on M. If a sequence 
of bounded mappings from M into a metric space N converges uniformly 
to a mapping / : M — > N, then it is easy to sec that / is also bounded. 

Let (M,d(x,y)), (N,p(u,v)) be metric spaces again, and let Cb(M,N) be 
the collection of bounded continuous mappings from M into N. One can check 
that 

(A.18) 9(f, g) = sup{p(/(x), g(x)) : x G M} 

defines a metric on Cb{M,N), known as the supremum metric. Note that a 
sequence {fj}jL\ of bounded continuous mappings from M into N converges 
to a bounded continuous mapping / : M — > N with respect to the supremum 
metric if and only if {/j}"^ converges to / uniformly on M. 

If N is complete as a metric space, then Cb(M,N) is also complete with 
respect to the supremum metric. More precisely, if {fj}j^\ is a sequence of 
bounded continuous mappings from M into N that is a Cauchy sequence with 
respect to (A.18), then it is easy to see that {fj{x)}^L 1 is a Cauchy sequence in 
N for each x G M. If N is complete, then it follows that {fj(x)}J^ 1 converges 
to an element f(x) of N for every x G M. Using the Cauchy condition with 
respect to (A.18), one can check that {fj}°Z 1 converges to / uniformly on M, 
and hence that / is bounded and continuous on M. 

Let (M,d(x,y)) and (N,p(u,v)) be metric spaces. A mapping / : M — > N 
is said to be uniformly continuous if for every e > there is a S > such that 

(A. 19) p(f(x), f{y)) < e for every x,y G M with d(x, y) < S. 

As before, the composition of two uniformly continuous mappings is uniformly 
continuous, as is the limit of a uniformly convergent sequence of uniformly 
continuous mappings. Similarly, the sum of two uniformly continuous real or 
complex-valued functions is also uniformly continuous, as is the product of a 
uniformly continuous function and a constant. The product of two bounded 
uniformly continuous real or complex-valued functions is uniformly continuous 
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as well, but this does not always work without the additional hypothesis of 
boundedness. 

Suppose that E is a dense subset of M, and that / is a uniformly continuous 
mapping from E into N. If N is complete, then there is a unique extension 
of / to a uniformly continuous mapping from M into N. To see this, let x 
be any element of M, and let {xj}J^ 1 be a sequence of elements of E that 
converges to x, which exists because E is dense in M. In particular, {xj}°°^ 1 
is a Cauchy sequence, and one can use the uniform continuity of / : E — > N 
to show that {f(xj)}° < L 1 is a Cauchy sequence in N. If N is complete, then it 
follows that {f(xj)}°^L 1 converges in N. One can also check that the limit of 
this sequence docs not depend on the specific choice of the sequence {xj}jl 1 
of elements of E converging to x, so that this defines a mapping from M into 
N. This new mapping clearly agrees with the original one on E, and one can 
use the uniform continuity of the original mapping on E to show that the new 
mapping is uniformly continuous on all of M. The uniqueness of the extension 
follows from the fact that two continuous mappings from M into N are the same 
when they agree on a dense set. 

A subset A of M is said to be totally bounded if for each e > 0, A can be 
covered by finitely many balls of radius e in M. It is easy to see that totally 
bounded sets are bounded, and that bounded subsets of R" with the standard 
metric are totally bounded. If / : M — >• N is uniformly continuous and ACM 
is totally bounded, then f(A) is totally bounded in N. 
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